Enabling recommendations and community by massively-distributed nearest-neighbor searching

ABSTRACT

The computer associated with each of a potentially large number of end users is harnessed to provide a massively-distributed mechanism for finding the nearest neighbors of each user, according to tastes and/or interests. Once these nearest neighbors are determined, there taste or and/or interest profiles are leveraged for highly accurate recommendations, and their online addresses are leveraged for community purposes.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part of International PatentApplication: PCT/US2005/02731, filed 27 Jan. 2005 for EnablingRecommendations and Community By Massively-Distributed Nearest-NeighborSearching, which claims priority from and benefit of the following U.S.Provisional Patent Applications: 60/540,041 filed 27 Jan. 2004, forEnabling Recommendations and Community by Massively-DistributedNearest-Neighbor Searching; 60/611,222 filed 18 Sep. 2004 for Communityand Recommendation System; and 60/635,197 filed 9 Dec. 2004 forCommunity and Recommendation System. Applicant hereby claims priorityfrom and benefit of the aforesaid applications 60/611,222 and60/635,197. Applicant hereby incorporates by reference herein to thefullest extent allowed by law the entire disclosure of each of theaforesaid applications, including all text, drawings, and code whetheron paper or machine-readable media.

RESERVATION OF COPYRIGHT Copyright © 2003, 2004, 2005 Emergent Music LLC

This application includes material which is subject to copyrightprotection. The copyright owner has no objection to the facsimilereproduction by anyone of the patent disclosure as it appears in patentoffice files or records, but otherwise reserves all copyright rightswhatsoever.

COMPACT DISC INCORPORATION BY REFERENCE

Applicants hereby incorporate by reference the entire contents of thematerial on the compact disc submitted concurrently herewith, and aslisted below. The disc was created on 17 Sep. 2005. Applicants submitherewith two individual compact discs, each being identical to theother.

Size Mon Day Year File

./cooltunes/Goombah Help:

-   3358 Aug. 31, 2004 bittorrent.html-   3455 Jul. 14, 2004 blogging.html-   3077 Aug. 31, 2004 contacting.html-   2621 Aug. 31, 2004 index.html-   2945 Aug. 31, 2004 install.html-   3097 Jul. 14, 2004 neighbors.html-   2842 Jul. 14, 2004 playingmusic.html-   4019 Aug. 31, 2004 prefs.html-   3086 Aug. 31, 2004 profiles.html-   3046 Jul. 14, 2004 recommendations.html-   2893 Jul. 14, 2004 suggestions.html-   2819 Jul. 14, 2004 terms.html-   2665 Jul. 14, 2004 upgrades.html-   5738 Jul. 27, 2004 web.html-   3145 Jul. 14, 2004 whatisit.html    ./cooltunes/Goombah Help/images:-   (empty)    ./cooltunes/bittorrent:-   6255 Sep. 17, 2004 bittorrentfetcherclass.py-   2001 Sep. 17, 2004 bittorrentfetcherclasstest.py-   1769 Jul. 15, 2004 bittorrenttimedfetcherclass.py-   1705 Jul. 15, 2004 bittorrenttimedfetcherclasstest.py-   2022 Aug. 30, 2004 btcreatetorrentsmain.py-   5061 Aug. 30, 2004 btdirectoryclass.py-   1112 Aug. 30, 2004 btdirectoryclassmain.py-   10396 Aug. 30, 2004 btdirectoryclasstest.py-   4786 Sep. 2, 2004 btseedmanagerclass.py-   5462 Aug. 30, 2004 btseedmanagerclasstest.py-   2346 Aug. 11, 2004 btseedsdaemonmain.py-   1149 Aug. 11, 2004 bttrackdaemonmain.py    ./cooltunes/clustering:-   7780 Aug. 20, 2004 clusterbuilderclass.py-   1877 Aug. 26, 2004 clusterbuilderclassmain.py-   7512 Aug. 20, 2004 clusterbuilderclasstest.py-   17139 Jul. 13, 2004 clusterfitterclass.py-   12368 Jul. 13, 2004 clusterfitterclasstest.py-   1876 Aug. 13, 2004 clusteringcandidatesfileclass.py-   4060 Jun. 9, 2004 clusteringcandidatesfileclasstest.py-   1835 Jun. 9, 2004 extremes.py-   38421 Jul. 16, 2004 genrerankhandlerclass.py-   10305 Jul. 13, 2004 genrerankhandlerclassrefactorings.txt-   102592 Jul. 13, 2004 genrerankhandlerclasstest.py-   1778 Jun. 2, 2004 publicprofilestringclass.py-   1089 Jun. 2, 2004 publicprofilestringclasstest.py-   2315 Jun. 9, 2004 unique.py    ./cooltunes/getblogurl:-   5715 Jul. 26, 2004 getblogurlclass.py    ./cooltunes/initialcandidates:-   421 Jun. 22, 2004 ReadMe.Candidates.txt    ./cooltunes/libchanges/BitTorrent:-   47 Jun. 3, 2004 BitTorrentVersion-3.4.2.txt-   9955 Jun. 3, 2004 Choker.py-   10802 Jun. 3, 2004 Connecter.py-   1016 Jun. 3, 2004 CurrentRateMeasure.py-   17734 Jun. 3, 2004 Downloader.py-   2855 Jun. 3, 2004 DownloaderFeedback.py-   18789 Jun. 3, 2004 Encrypter.py-   6054 Jun. 3, 2004 HTTPHandler.py-   2650 Jun. 3, 2004 NatCheck.py-   5014 Jun. 3, 2004 PiecePicker.py-   1503 Jun. 3, 2004 RateMeasure.py-   18347 Jun. 3, 2004 RawServer.py-   5653 Jun. 3, 2004 Rerequester.py-   5021 Jun. 3, 2004 Storage.py-   17029 Jun. 3, 2004 StorageWrapper.py-   8059 Jun. 3, 2004 Uploader.py-   18 Jun. 3, 2004 _init_.py-   7052 Jun. 3, 2004 bencode.py-   3733 Jun. 3, 2004 bitfield.py-   3831 Jun. 3, 2004 btformats.py-   12791 Jun. 3, 2004 download.py-   2240 Jun. 3, 2004 fakeopen.py-   3579 Jun. 3, 2004 parseargs.py-   2287 Jun. 3, 2004 selectpoll.py-   2052 Jun. 3, 2004 testtest.py-   24605 Jun. 3, 2004 track.py-   4261 Jun. 3, 2004 zurllib.py    ./cooltunes/libchanges/macos/xml:-   1806 Apr. 5, 2004 FtCore.py-   389 Apr. 5, 2004 ReadMe.Bob.txt-   1083 Apr. 5, 2004 _init_.py-   9627 Apr. 5, 2004 ns.py    ./cooltunes/libchanges/macos/xml/dom:-   4235 Apr. 5, 2004 Attr.py-   644 Apr. 5, 2004 CDATASection.py-   4094 Apr. 5, 2004 CharacterData.py-   603 Apr. 5, 2004 Comment.py-   1936 Apr. 5, 2004 DOMImplementation.py-   11948 Apr. 5, 2004 Document.py-   1296 Apr. 5, 2004 DocumentFragment.py-   3399 Apr. 5, 2004 DocumentType.py-   10264 Apr. 5, 2004 Element.py-   2610 Apr. 5, 2004 Entity.py-   1394 Apr. 5, 2004 EntityReference.py-   3438 Apr. 5, 2004 Event.py-   16628 Apr. 5, 2004 FtNode.py-   2259 Apr. 5, 2004 MessageSource.py-   5052 Apr. 5, 2004 NamedNodeMap.py-   937 Apr. 5, 2004 NodeFilter.py-   3998 Apr. 5, 2004 Nodelterator.py-   1442 Apr. 5, 2004 NodeList.py-   2056 Apr. 5, 2004 Notation.py-   2080 Apr. 5, 2004 Processinglnstruction.py-   40190 Apr. 5, 2004 Range.py-   1195 Apr. 5, 2004 Text.py-   6995 Apr. 5, 2004 TreeWalker.py-   7545 Apr. 5, 2004 _init_.py-   3481 Apr. 5, 2004 domreg.py-   36379 Apr. 5, 2004 expatbuilder.py-   19289 Apr. 5, 2004 javadom.py-   5287 Apr. 5, 2004 minicompat.py-   65671 Apr. 5, 2004 minidom.py-   1274 Apr. 5, 2004 minitraversal.py-   11978 Apr. 5, 2004 pulldom.py-   12384 Apr. 5, 2004 xmlbuilder.py    ./cooltunes/libchanges/macos/xml/dom/ext:-   11057 Apr. 5, 2004 Dom2Sax.py-   13835 Apr. 5, 2004 Printer.py-   2344 Apr. 5, 2004 Visitor.py-   1584 Apr. 5, 2004 XHtml2HtmlPrinter.py-   1634 Apr. 5, 2004 XHtmlPrinter.py-   10102 Apr. 5, 2004 _init_.py-   13186 Apr. 5, 2004 cl4n.py    ./cooltunes/libchanges/macos/xml/dom/ext/reader:-   3174 Apr. 5, 2004 HtmlLib.py-   3123 Apr. 5, 2004 HtmlSax.py-   8871 Apr. 5, 2004 PyExpat.py-   6381 Apr. 5, 2004 Sax.py-   15985 Apr. 5, 2004 Sax2.py-   8295 Apr. 5, 2004 Sax2Lib.py-   10310 Apr. 5, 2004 Sgmlop.py-   2207 Apr. 5, 2004 _init_.py    ./cooltunes/libchanges/macos/xml/dom/html:-   9836 Apr. 5, 2004 GenerateHtml.py-   3788 Apr. 5, 2004 HTMLAnchorElement.py-   3411 Apr. 5, 2004 HTMLAppletElement.py-   2959 Apr. 5, 2004 HTMLAreaElernent.py-   1309 Apr. 5, 2004 HTMLBRElement.py-   1501 Apr. 5, 2004 HTMLBaseElement.py-   1702 Apr. 5, 2004 HTMLBaseFontElement.py-   2361 Apr. 5, 2004 HTMLBodyElement.py-   2686 Apr. 5, 2004 HTMLButtonElement.py-   2175 Apr. 5, 2004 HTMLCollection.py-   1396 Apr. 5, 2004 HTMLDListElement.py-   1047 Apr. 5, 2004 HTMLDOMImplementation.py-   1405 Apr. 5, 2004 HTMLDirectoryElement.py-   1509 Apr. 5, 2004 HTMLDivElement.py-   11633 Apr. 5, 2004 HTMLDocument.py-   3572 Apr. 5, 2004 HTMLElement.py-   1299 Apr. 5, 2004 HTMLFieldSetElement.py-   1690 Apr. 5, 2004 HTMLFontElement.py-   3327 Apr. 5, 2004 HTMLFormElement.py-   3564 Apr. 5, 2004 HTMLFrameElement.py-   1497 Apr. 5, 2004 HTMLFrameSetElement.py-   2016 Apr. 5, 2004 HTMLHRElement.py-   1312 Apr. 5, 2004 HTMLHeadElement.py-   1314 Apr. 5, 2004 HTMLHeadingElement.py-   1312 Apr. 5, 2004 HTMLHtmlElement.py-   3894 Apr. 5, 2004 HTMLIFrameElement.py-   3888 Apr. 5, 2004 HTMLhmageElement.py-   6481 Apr. 5, 2004 HTMLInputElement.py-   1553 Apr. 5, 2004 HTMLIsIndexElement.py-   1558 Apr. 5, 2004 HTMLLIElement.py-   1784 Apr. 5, 2004 HTMLLabelElement.py-   1798 Apr. 5, 2004 HTMLLegendElement.py-   3046 Apr. 5, 2004 HTMLLinkElement.py-   1377 Apr. 5, 2004 HTMLMapElement.py-   1396 Apr. 5, 2004 HTMLMenuElement.py-   1961 Apr. 5, 2004 HTMLMetaElement.py-   1514 Apr. 5, 2004 HTMLModElement.py-   1869 Apr. 5, 2004 HTMLOListElement.py-   5233 Apr. 5, 2004 HTMLObjectElement.py-   1623 Apr. 5, 2004 HTMLOptGroupElement.py-   3651 Apr. 5, 2004 HTMLOptionElement.py-   1322 Apr. 5, 2004 HTMLParagraphElement.py-   1949 Apr. 5, 2004 HTMLParamElement.py-   1364 Apr. 5, 2004 HTMLPreElement.py-   1283 Apr. 5, 2004 HTMLQuoteElement.py-   3150 Apr. 5, 2004 HTMLScriptElement.py-   4750 Apr. 5, 2004 HTMLSelectElement.py-   811 Apr. 5, 2004 HTMLStyleElement.py-   1334 Apr. 5, 2004 HTMLTableCaptionElement.py-   4684 Apr. 5, 2004 HTMLTableCellElement.py-   2421 Apr. 5, 2004 HTMLTableColElement.py-   9117 Apr. 5, 2004 HTMLTableElement.py-   3711 Apr. 5, 2004 HTMLTableRowElement.py-   2877 Apr. 5, 2004 HTMLTableSectionElement.py-   4989 Apr. 5, 2004 HTMLTextAreaElement.py-   1837 Apr. 5, 2004 HTMLTitleElement.py-   1612 Apr. 5, 2004 HTMLUListElement.py-   36479 Apr. 5, 2004 _init_.py    ./cooltunes/libchanges/macos/xml/marshal:-   359 Apr. 5, 2004 _init_.py-   20344 Apr. 5, 2004 generic.py-   10023 Apr. 5, 2004 wddx.py    ./cooltunes/libchanges/macos/xml/parsers:-   43 Apr. 5, 2004 _init_.py-   116 Apr. 5, 2004 expat.py-   19361 Apr. 5, 2004 sgmllib.py    ./cooltunes/libchanges/macos/xml/parsers/xmlproc:-   22 Apr. 5, 2004 _init_.py-   1657 Apr. 5, 2004 _outputters.py-   10134 Apr. 5, 2004 catalog.py-   6593 Apr. 5, 2004 charconv.py-   22875 Apr. 5, 2004 dtdparser.py-   33805 Apr. 5, 2004 errors.py-   4852 Apr. 5, 2004 namespace.py-   6752 Apr. 5, 2004 utils.py-   2340 Apr. 5, 2004 xcatalog.py-   7067 Apr. 5, 2004 xmlapp.py-   28475 Apr. 5, 2004 xmldtd.py-   19970 Apr. 5, 2004 xmlproc.py-   32619 Apr. 5, 2004 xmlutils.py-   10167 Apr. 5, 2004 xmlval.py    ./cooltunes/libchanges/macos/xml/sax:-   1602 Apr. 5, 2004 _init_.py-   4662 Apr. 5, 2004 _exceptions.py-   15122 Apr. 5, 2004 expatreader.py-   14084 Apr. 5, 2004 handler.py-   1250 Apr. 5, 2004 sax2exts.py-   6617 Apr. 5, 2004 saxexts.py-   15687 Apr. 5, 2004 saxlib.py-   24428 Apr. 5, 2004 saxutils.py-   18864 Apr. 5, 2004 writer.py-   12580 Apr. 5, 2004 xmlreader.py    ./cooltunes/libchanges/macos/xml/sax/drivers:-   39 Apr. 5, 2004 _init_.py-   1051 Apr. 5, 2004 drv_htmllib.py-   3112 Apr. 5, 2004 drv_ltdriver.py-   895 Apr. 5, 2004 drv_ltdriver_val.py-   5893 Apr. 5, 2004 drv_pyexpat.py-   979 Apr. 5, 2004 drv_sgmllib.py-   2700 Apr. 5, 2004 drv_sgmlop.py-   3685 Apr. 5, 2004 drv_xmldc.py-   2709 Apr. 5, 2004 drv_xmllib.py-   4402 Apr. 5, 2004 drv_xmlproc.py-   1774 Apr. 5, 2004 drv_xmlproc_val.py-   2509 Apr. 5, 2004 drv_xmltoolkit.py-   3393 Apr. 5, 2004 pylibs.py    ./cooltunes/libchanges/macos/xml/sax/drivers2:-   39 Apr. 5, 2004 _init_.py-   422 Apr. 5, 2004 drv_htmllib.py-   5931 Apr. 5, 2004 drv_javasax.py-   645 Apr. 5, 2004 drv_pyexpat.py-   3759 Apr. 5, 2004 drv_sgmllib.py-   4386 Apr. 5, 2004 drv_sgmlop.py-   2467 Apr. 5, 2004 drv_sgmlop_html.py-   13532 Apr. 5, 2004 drv_xmlproc.py    ./cooltunes/libchanges/macos/xml/schema:-   38 Apr. 5, 2004 _init_.py-   60039 Apr. 5, 2004 trex.py    ./cooltunes/libchanges/macos/xml/unicode:-   158 Apr. 5, 2004 _init_.py-   2863 Apr. 5, 2004 iso8859.py-   11690 Apr. 5, 2004 utf8_iso.py    ./cooltunes/libchanges/macos/xml/utils:-   22 Apr. 5, 2004 _init_.py-   26221 Apr. 5, 2004 characters.py-   5676 Apr. 5, 2004 iso8601.py-   6160 Apr. 5, 2004 qp_xm.py    ./cooltunes/libchanges/macos/xml/xpath:-   9457 Apr. 5, 2004 BuiltInExtFunctions.py-   2193 Apr. 5, 2004 Context.py-   5865 Apr. 5, 2004 Conversions.py-   11233 Apr. 5, 2004 CoreFunctions.py-   1159 Apr. 5, 2004 ExpandedNameWrapper.py-   996 Apr. 5, 2004 MessageSource.py-   757 Apr. 5, 2004 NamespaceNode.py-   2047 Apr. 5, 2004 ParsedAbbreviatedAbsoluteLocationPath.py-   2137 Apr. 5, 2004 ParsedAbbreviatedRelativeLocationPath.py-   1228 Apr. 5, 2004 ParsedAbsoluteLocationPath.py-   9080 Apr. 5, 2004 ParsedAxisSpecifier.py-   21415 Apr. 5, 2004 ParsedExpr.py-   5443 Apr. 5, 2004 ParsedNodeTest.py-   2483 Apr. 5, 2004 ParsedPredicateList.py-   1464 Apr. 5, 2004 ParsedRelativeLocationPath.py-   3414 Apr. 5, 2004 ParsedStep.py-   951 Apr. 5, 2004 Set.py-   6005 Apr. 5, 2004 Util.py-   34402 Apr. 5, 2004 XPathGrammar.py-   37104 Apr. 5, 2004 XPathParser.py-   2924 Apr. 5, 2004 XPathParserBase.py-   3192 Apr. 5, 2004 _init_.py-   11280 Apr. 5, 2004 pyxpath.py-   6236 Apr. 5, 2004 yappsrt.py    ./cooltunes/macui:-   993 Apr. 5, 2004 Building_coolTunes.txt-   528 Apr. 5, 2004 Credits.html-   2142 May 10, 2004 alertdialogclasstestmanually.py-   3022 Jul. 26, 2004 buildapp.py-   144 May 10, 2004 buildapp_alertdialgoclasstestscript.py-   171 Jul. 21, 2004 buildapp_macpleasewaitdialogclasstestscript.py-   143 Jun. 23, 2004 buildapp_opendialogclasstestscript.py-   164 Apr. 16, 2004 buildapp_progressdialogclasstestscript.py-   4232 Jun. 22, 2004 builddiskimage.sh-   310 Apr. 5, 2004 clearuser.sh-   134519 Sep. 17, 2004 cooltunescontrollerclass.py-   277 Apr. 6, 2004 cooltunescontrollerclasstest.py-   763 Jul. 26, 2004 cooltunesmain.py-   6491 Apr. 16, 2004 itunesdbreaderclass.py-   845 Jul. 26, 2004 itunesdbreaderprogressslavemain.py-   10655 Sep. 13, 2004 itunesdbreaderslaveclass.py-   859 Jul. 26, 2004 itunesdbreaderslavemain.py-   3183 Apr. 5, 2004 itunesscripterclass.py-   1134 May 10, 2004 macalertdialogclass.py-   1009 Jun. 23, 2004 macopendialogclass.py-   3874 Jul. 21, 2004 macpleasewaitdialogclass.py-   1644 Aug. 20, 2004 macpleasewaitdialogclasstest.py-   5411 May 10, 2004 macprogressdialogclass.py-   1110 Apr. 16, 2004 nibutilities.py-   1533 Jun. 23, 2004 opendialogclasstestmanually.py-   13767 Jul. 26, 2004 preffileclass.py-   1189 Jun. 4, 2004 preffileclasstest.py-   1595 Apr. 16, 2004 progressdialogclasstest.py    ./cooltunes/patterns:-   1497 Jun. 24, 2004 immutablelistclass.py-   6695 Apr. 5, 2004 observennixin.py-   8089 Apr. 5, 2004 older-   8143 Apr. 5, 2004 persistencemixin.py-   4347 Apr. 5, 2004 singletonautopersistence.py-   8642 Jun. 10, 2004 singletonmixin.py-   4355 Apr. 5, 2004 synchronization.py    ./cooltunes/pyclient:-   850 Jun. 11, 2004 alertdialogclass.py-   6829 Jul. 23, 2004 candidatefileclass.py-   11120 Aug. 20, 2004 candidatefileclasstest.py-   12624 Sep. 17, 2004 candidatefilefetcherclass.py-   9823 Sep. 16, 2004 candidatefilefetcherclasstest.py-   1869 May 13, 2004 candidatefileneighborretrieverclass.py-   3502 May 13, 2004 candidatefileneighborretrieverclasstest.py-   397 Jul. 23, 2004 clientemail.py-   593 Jul. 23, 2004 clientemailtest.py-   42266 Sep. 17, 2004 cooltunesclass.py-   15103 Jul. 29, 2004 cooltunesclasstest.py-   26 Jun. 22, 2004 cooltunesversion.py-   3578 Apr. 6, 2004 currentclientversionclass.py-   3899 Apr. 14, 2004 currentclientversionclasstest.py-   5476 Aug. 10, 2004 daemonize.py-   4235 Aug. 20, 2004 errorloggerclass.py-   3000 Apr. 14, 2004 errorloggerclasstest.py-   3084 Aug. 2, 2004 filteredreconmmenderclass.py-   4745 Aug. 2, 2004 filteredrecommenderclasstest.py-   1042 Jun. 9, 2004 genreprofilerclass.py-   1207 Jun. 9, 2004 genreprofilerclasstest.py-   17125 Aug. 25, 2004 goombahserverclass.py-   25439 Aug. 20, 2004 goombahserverclasstest.py-   1991 Apr. 6, 2004 heartbeatclass.py-   1498 Apr. 14, 2004 heartbeatclasstest.py-   776 Apr. 6, 2004 listutilities.py-   2015 Apr. 6, 2004 listutilitiestest.py-   10666 Jul. 27, 2004 musicurlclass.py-   10843 Jul. 27, 2004 musicurlclasstest.py-   6362 Jul. 16, 2004 neighborbagclass.py-   4886 May 12, 2004 neighborclass.py-   15170 Sep. 17, 2004 neighborscannerclass.py-   11853 Sep. 17, 2004 neighborscannerclasstest.py-   863 Jul. 26, 2004 neighborscannerprogressslavemain.py-   6191 Sep. 16, 2004 neighborscannerslaveclass.py-   259 Apr. 6, 2004 neighborscannerslaveclasstest.py-   875 Jul. 26, 2004 neighborscannerslavemain.py-   5326 Apr. 16, 2004 neighborsearcherclass.py-   39195 Jul. 26, 2004 neighborsearcherslaveclass.py-   4868 Aug. 10, 2004 normalize.py-   49 Jul. 15, 2004 normalizefastcompile.sh-   278 Jul. 15, 2004 normalizefastsetup.py-   5121 Jul. 15, 2004 normalizefasttest.py-   7182 Jul. 15, 2004 nonnalizetest.py-   4821 Jun. 7, 2004 onewayfileclass.py-   1218 Apr. 14, 2004 onewayfileclasstest.py-   612 Jun. 23, 2004 opendialogclass.py-   15710 Jul. 16, 2004 openexclusive.py-   5089 Apr. 6, 2004 openexclusivetest.py-   3928 Apr. 8, 2004 picklepipeclass.py-   6662 Apr. 14, 2004 picklepipeclasstest.py-   706 Apr. 8, 2004 picklepipeclasstestwriter.py-   1316 Jul. 21, 2004 pleasewaitdialogclass.py-   24019 Jul. 21, 2004 plisthandlerclass.py-   5930 Jul. 21, 2004 plisthandlerclasstest.py-   3654 Jul. 21, 2004 processprogressclass.py-   3248 Jul. 21, 2004 processprogressclasstest.py-   4145 May 7, 2004 progressdialogclass.py-   39746 Aug. 2, 2004 recommenderclass.py-   11977 Aug. 5, 2004 recommenderhandlerclass.py-   12262 Jul. 14, 2004 recommenderhandlerclasstest.py-   13947 Jul. 26, 2004 slaveprocessclass.py-   1744 Jun. 15, 2004 slaveprocessclasstest.py-   3539 Jun. 4, 2004 sortedneighborlistclass.py-   4951 Jun. 4, 2004 sortedneighborlistclasstest.py-   50519 Aug. 5, 2004 tasteprofileclass.py-   572 Jul. 23, 2004 tasteprofileclassrefactorings.txt-   4460 Aug. 5, 2004 tasteprofileclasstest.py-   1269 Apr. 14, 2004 test.py-   436 Jun. 15, 2004 testidlerclass.py-   5001 Sep. 17, 2004 timeutilities.py-   7047 Sep. 17, 2004 timeutilitiestest.py-   9366 Apr. 5, 2004 traceclass.py-   336 Aug. 4, 2004 transposeexceptions.py-   4503 Jul. 29, 2004 userclass.py-   208 Jul. 29, 2004 userclasstest.py-   7902 Apr. 5, 2004 userdefaultsclass.py-   4794 Aug. 13, 2004 userpathsclass.py-   3701 Jul. 26, 2004 userpathsclasstest.py-   12875 Aug. 20, 2004 utilities.py-   13770 Jun. 7, 2004 utilitiestest.py-   6607 Jul. 26, 2004 versioncheckerclass.py-   3728 Sep. 17, 2004 viewfactoryclass.py-   2267 Apr. 5, 2004 build.xml-   2392 Jun. 23, 2004 web.xml    ./cooltunes/webserver/WEB-INF/conf:-   37329 Jul. 28, 2004 TurbineResources.properties    ./cooltunes/webserver/WEB-INF/lib:-   (empty)    ./cooltunes/webserver/database:-   2802 Aug. 11, 2004 MysqlSchema.sql-   309 Apr. 5, 2004 backup-goo.sh    ./cooltunes/webserver/java/com/transpose/cooltunes:-   3912 Apr. 5, 2004 BlogList.java-   5501 Aug. 11, 2004 BlogPostList.java-   863 Apr. 5, 2004 CTBlog.java-   4515 Apr. 5, 2004 CTBlogPost.java-   2230 Jun. 2, 2004 ClusteringCandidatesFileWriter.java-   1151 Jun. 2, 2004 ClusteringCandidatesSaver.java-   983 Apr. 5, 2004 GeneralComment.java-   7257 Aug. 11, 2004 GeneralCommentList.java-   7811 Apr. 5, 2004 NearestNeighbor.java-   6481 Apr. 5, 2004 News.java-   3223 Apr. 5, 2004 NewsList.java-   8129 Jun. 2, 2004 PublicProfile.java-   17063 Aug. 20, 2004 RPC2Handler.java-   14403 Jun. 2, 2004 User.java    ./cooltunes/webserver/java/com/transpose/cooltunes/servlets:-   764 Apr. 5, 2004 AppInit.java-   7013 Apr. 5, 2004 BlogServlet.java-   4432 Apr. 5, 2004 GeneralCommentServlet.java-   671 Apr. 5, 2004 HelloWorld.java-   6371 Apr. 5, 2004 LoginServlet.java-   1277 Apr. 5, 2004 RPC2.java-   6611 Aug. 11, 2004 UserServlet.java    ./cooltunes/webserver/java/com/transpose/libs:-   (empty)    ./cooltunes/webserver/java/com/transpose/util:-   321 Apr. 5, 2004 KeyNotFoundException.java-   1026 Apr. 5, 2004 Mailer.java-   1313 Apr. 5, 2004 XmlRpcFault.java    ./cooltunes/webserver/jsps:-   2088 Sep. 16, 2004 about.jsp-   1621 Apr. 5, 2004 blogitem.jsp-   953 Apr. 5, 2004 blogs.jsp-   28831 Apr. 5, 2004 clickwrap.jsp-   1345 Sep. 13, 2004 contact.jsp-   1550 Apr. 5, 2004 createblog.jsp-   2600 Apr. 5, 2004 createuser.jsp-   360 Apr. 5, 2004 dbtest.jsp-   1551 Sep. 13, 2004 discussion.jsp-   3402 Sep. 13, 2004 download.jsp-   11304 Apr. 5, 2004 editblog.jsp-   7628 Sep. 13, 2004 faq.jsp-   308 Sep. 13, 2004 getNumUsers.jsp-   2169 Sep. 13, 2004 index.jsp-   1899 Sep. 13, 2004 login.jsp-   520 Apr. 5, 2004 logout.jsp-   1028 Apr. 5, 2004 mailpassword.jsp-   1656 Sep. 13, 2004 privacy.jsp-   2612 Apr. 5, 2004 releases.jsp-   1008 Apr. 5, 2004 send_verification.jsp-   2528 Apr. 5, 2004 startdiscussion.jsp-   1045 Apr. 5, 2004 style.css-   394 Jun. 2, 2004 test.jsp-   293 Jun. 2, 2004 testclusteringcandidates.jsp-   3139 Sep. 13, 2004 tos.jsp-   1078 Apr. 5, 2004 verify.jsp-   171 Apr. 5, 2004 verify_mailed.jsp-   6687 Apr. 5, 2004 viewblog.jsp-   962 Apr. 5, 2004 viewblogbyuser.jsp-   3340 Apr. 5, 2004 viewdiscussion.jsp-   3371 Apr. 5, 2004 viewforum.jsp    ./cooltunes/webserver/jsps/images:-   (empty)    ./cooltunes/webserver/jsps/includes:-   0 Apr. 5, 2004 announcement.jsp-   1706 Sep. 13, 2004 beginbody.jsp-   731 Sep. 13, 2004 endbody.jsp-   0 Apr. 5, 2004 footer.jsp-   0 Apr. 5, 2004 header.jsp-   455 Apr. 5, 2004 jspheader.jsp-   2017 Sep. 17, 2004 build.xml-   5524 May 20, 2004 web.xml    ./songsifter/WEB-INF/conf:-   38247 May 20, 2004 TurbineResources.properties    ./songsifter/WEB-INF/tlds:-   (empty)    ./songsifter/database:-   13853 May 20, 2004 DemoSchema.sql-   5377 May 20, 2004 MusicNewsSchema.sql-   13306 May 20, 2004 MysqlSchema.sql-   702 May 20, 2004 NewsSchema.sql-   1906 May 20, 2004 OracleClearData.sql-   185 May 20, 2004 OracleEMCreator.sql-   1975 May 20, 2004 OracleFixSequences.sql-   3829 May 20, 2004 OracleInitValues.sql-   3132 May 20, 2004 OracleJDBCUser.sql-   14427 May 20, 2004 OracleSchema.sql-   625 May 20, 2004 RepairCTXIndexes.sql-   5450 May 20, 2004 SuggestionSchema.sql-   214 May 20, 2004 oraclecommands.txt-   340 May 20, 2004 savepoints.sql-   1082 May 20, 2004 seq.temp.sql    ./songsifter/java/com/transpose:-   780 May 20, 2004 Makefile-   1774 May 20, 2004 Makefile.include    ./songsifter/java/com/transpose/deed:-   6160 May 20, 2004 AuctionItem.java-   28528 May 20, 2004 BackgroundInfo.java-   6881 May 20, 2004 BestDeedList.java-   4896 May 20, 2004 Bid.java-   2520 May 20, 2004 Blog.java-   4856 May 20, 2004 BlogIDFanID.java-   10927 Aug. 11, 2004 BlogPost.java-   976 May 20, 2004 ChangedBestDeedList.java-   575 May 20, 2004 ChangedDeedList.java-   5094 May 20, 2004 ClickThru.java-   10806 May 20, 2004 DBTableNames.java-   383 May 20, 2004 DBTableSelector.java-   41674 May 20, 2004 Deed.java-   12631 May 20, 2004 DeedAndChildList.java-   8487 May 20, 2004 DeedComment.java-   922 May 20, 2004 DeedIDAndLevel.java-   973 May 20, 2004 DeedList.java-   11195 May 20, 2004 DeedListImplementor.java-   19138 May 20, 2004 DeedRating.java-   3576 May 20, 2004 DeedRatingTable.java-   2501 May 20, 2004 DeedTable.java-   7493 May 20, 2004 Deed_Fan.java-   15610 Aug. 11, 2004 DiscussionComment.java-   1419 May 20, 2004 FanDeedList.java-   4467 May 20, 2004 Forum.java-   3781 May 20, 2004 ForumList.java-   3866 May 20, 2004 K2Factory.java-   5387 May 24, 2004 K2User.java-   3236 May 20, 2004 K2UserList.java-   3937 May 20, 2004 K2UserOption.java-   4497 May 20, 2004 K2UserPoints.java-   26110 May 20, 2004 K2UserValue.java-   4248 May 20, 2004 K2UserValueTable.java-   3880 May 20, 2004 MailingList.java-   249 May 20, 2004 Makefile-   4292 May 20, 2004 NeedRatingDeedListlmplementor.java-   264 May 20, 2004 NotEnoughPointsException.java-   4883 May 20, 2004 NotifyEvent.java-   10955 May 20, 2004 PointsChange.java-   5100 May 20, 2004 PointsChangeTable.java-   249 May 20, 2004 Searchable.java-   2627 May 20, 2004 SearchableDeedListImplementor.java-   18586 May 20, 2004 Topic.java-   8916 May 20, 2004 TopicComment.java-   2714 May 20, 2004 TopicTable.java    ./songsifter/java/com/transpose/deed/servlets:-   5009 May 20, 2004 DeedServlet.java-   3046 May 20, 2004 EditDeedServlet.java-   250 May 20, 2004 Makefile-   4993 May 20, 2004 ModeratorCommentServlet.java-   1151 May 20, 2004 ServletParameterException.java-   3938 May 20, 2004 StoreDeedServlet.java    ./songsifter/java/com/transpose/deed/test:-   580 May 20, 2004 testclickthru.Jsp-   817 May 20, 2004 testcounts.jsp-   1128 May 20, 2004 testdeednumbers.jsp-   1465 May 20, 2004 testdeedsforfan.jsp-   1376 May 20, 2004 testhistory.jsp-   930 May 20, 2004 testlatest.jsp-   1996 May 20, 2004 testneediest.jsp-   611 May 20, 2004 testoriginaldeed.jsp-   1057 May 20, 2004 testresetbest.jsp    ./songsifter/java/com/transpose/k2math:-   293 May 20, 2004 InconsistentDataException.java-   26034 May 20, 2004 K2MathClass.java-   243 May 20, 2004 Makefile-   345 May 20, 2004 NotEnoughDataException.java-   203 May 20, 2004 PleaseStopException.java-   22809 May 20, 2004 ProcessBackgroundRatingCutoffs.java-   78082 May 20, 2004 ProcessDirtyDeedRatings.java-   19116 May 20, 2004 ReinitializeMath.java    ./songsifter/java/com/transpose/libs:-   (empty)    ./songsifter/java/com/transpose/my:-   5616 May 20, 2004 Affinity.java-   2440 May 20, 2004 EmailAFriendTopic.java-   4849 May 20, 2004 Fan_Affinity.java-   3725 May 20, 2004 Fan_AffinityList.java-   6295 May 20, 2004 K2MYFactory.java-   4595 May 20, 2004 Login.java-   885 May 20, 2004 MYBackgroundInfo.java-   4886 May 20, 2004 MYBestDeedList.java-   1313 May 20, 2004 MYChangedBestDeedList.java-   1302 May 20, 2004 MYChangedDeedList.java-   10778 May 20, 2004 MYDeed.java-   493 May 20, 2004 MYDeedList.java-   2638 May 20, 2004 MYDeedListImplementor.java-   1353 May 20, 2004 MYDeedRating.java-   3481 May 20, 2004 MYFan.java-   6064 May 20, 2004 MYFanList.java-   4092 May 20, 2004 MYFanOption.java-   919 May 20, 2004 MYFanValue.java-   680 May 20, 2004 MYPointsChange.java-   2392 May 20, 2004 MYScheduledTasks.java-   9724 May 20, 2004 MYTopic.java-   997 May 20, 2004 MYTopicComment.java-   1172 May 20, 2004 MYUser.java-   245 May 20, 2004 Makefile-   537 May 20, 2004 ProcessBackgroundMYRatingCutoffs.java-   656 May 20, 2004 ProcessDirtyMYDeedRatings.java-   1066 May 20, 2004 ProcessDirtyMYDeedRatingsScheduledTask.java-   1055 May 20, 2004 ProcessMYBGlnfoScheduledTask.java    ./songsifter/java/com/transpose/my/servlets:-   797 May 20, 2004 AppInit.java-   18270 May 20, 2004 CreateMYDeedServlet.java-   8622 May 20, 2004 CreateMYPersonServlet.java-   5498 May 20, 2004 DeedRatingServlet.java-   4718 May 20, 2004 EditMYDeedServlet.java-   28926 May 20, 2004 FanServlet.java-   7032 May 20, 2004 LoginServlet.java-   3057 May 20, 2004 MYTopicCommentServlet.java-   248 May 20, 2004 Makefile-   4487 May 20, 2004 StoreMYDeedServlet.java-   4226 May 20, 2004 UploadMYPictureServlet.java    ./songsifter/java/com/transpose/scheduledjobs:-   2707 May 20, 2004 JobMinder.java-   939 May 20, 2004 JobMinderScheduledTask.java-   247 May 20, 2004 Makefile-   326 May 20, 2004 PoliteRunnable.java-   9403 May 20, 2004 ScheduledTask.java-   1746 May 20, 2004 ScheduledTaskList.java-   1774 May 20, 2004 TestScheduledTask.java    ./songsifter/java/com/transpose/songdeed:-   12702 May 20, 2004 AlbumAuctionItem.java-   7716 May 20, 2004 AlbumAuctionItemList.java-   1026 May 20, 2004 AlbumBid.java-   4161 May 20, 2004 Announcement.java-   3736 May 20, 2004 ArtistList.java-   2894 Aug. 11, 2004 ArtistWeeklyEmailMessage.java-   4860 May 20, 2004 BlogSongs.java-   927 May 20, 2004 BlogSongsScheduledTask.java-   4827 May 20, 2004 EMScheduledTasks.java-   5051 May 20, 2004 EmaiLAFriend.java-   2519 May 20, 2004 EmaiLAFriendTopic.java-   25661 Aug. 11, 2004 Fan.java-   6681 May 20, 2004 FanList.java-   4076 May 20, 2004 FanOption.java-   11085 May 20, 2004 FanSongPointsChangesList.java-   4705 May 20, 2004 Fan_Genre.java-   3905 May 20, 2004 Fan_GenreList.java-   996 May 20, 2004 GeneralComment.java-   6984 Aug. 11, 2004 GeneralCommnentList.java-   4107 May 20, 2004 Genre.java-   6712 May 20, 2004 K2SongFactory.java-   458 May 20, 2004 LinkEntry.java-   4682 May 20, 2004 Login.java-   4397 May 20, 2004 LoginList.java-   251 May 20, 2004 Makefile-   1616 May 20, 2004 NeedRatingDeedList.java-   6044 May 20, 2004 News.java-   3349 May 20, 2004 NewsList.java-   5396 May 20, 2004 ProcessArtistWeeklyPromotionEmail.java-   1203 May 20, 2004    ProcessArtistWeeklyPromotionEmaiIScheduledTask.java-   1518 May 20, 2004 ProcessAuctionResults.java-   1049 May 20, 2004 ProcessAuctionResultsScheduledTask.java-   577 May 20, 2004 ProcessBackgroundSongRatingCutoffs.java-   4400 May 20, 2004 ProcessBids.java-   1254 May 20, 2004 ProcessBidsDollars.java-   1025 May 20, 2004 ProcessBidsDollarsScheduledTask.java-   1245 May 20, 2004 ProcessBidsPoints.java-   993 May 20, 2004 ProcessBidsPointsScheduledTask.java-   708 May 20, 2004 ProcessDirtySongDeedRatings.java-   1107 May 20, 2004 ProcessDirtySongDeedRatingsScheduledTask.java-   1098 May 20, 2004 ProcessSongBGInfoScheduledTask.java-   3318 May 20, 2004 ProcessTopScorerContest.java-   1103 May 20, 2004 ProcessTopScorerContestScheduledTask.java-   5514 May 20, 2004 PromotedTopic.java-   764 May 20, 2004 PromotedTopicDollars.java-   4121 May 20, 2004 PromotedTopicList.java-   678 May 20, 2004 PromotedTopicListDollars.java-   672 May 20, 2004 PromotedTopicListPoints.java-   759 May 20, 2004 PromotedTopicPoints.java-   12168 Aug. 11, 2004 RPC2Handler.java-   526 May 20, 2004 ReinitializeSongMath.java-   903 May 20, 2004 SongBackgroundInfo.java-   1309 May 20, 2004 SongBestDeedList.java-   1327 May 20, 2004 SongChangedBestDeedList.java-   1316 May 20, 2004 SongChangedDeedList.java-   22561 May 20, 2004 SongDeed.java-   1005 May 20, 2004 SongDeedComment.java-   514 May 20, 2004 SongDeedList.java-   2651 May 20, 2004 SongDeedListImplementor.java-   602 May 20, 2004 SongDeedListSearcher.java-   618 May 20, 2004 SongDeedNotifyEvent.java-   1433 May 20, 2004 SongDeedRating.java-   3573 May 20, 2004 SongDeedValidator.java-   3413 May 20, 2004 SongDeed_Fan.java-   821 May 20, 2004 SongFanDeedList.java-   933 May 20, 2004 SongFanValue.java-   8473 May 20, 2004 SongLink.java-   688 May 20, 2004 SongPointsChange.java-   4190 May 20, 2004 SongSearchBestDeedList.java-   11881 May 20, 2004 SongTopic.java-   7714 May 20, 2004 SongTopicBid.java-   776 May 20, 2004 SongTopicBidDollars.java-   915 May 20, 2004 SongTopicBidDollarsList.java-   4802 May 20, 2004 SongTopicBidList.java-   774 May 20, 2004 SongTopicBidPoints.java-   925 May 20, 2004 SongTopicBidPointsList.java-   1015 May 20, 2004 SongTopicComment.java-   4888 May 20, 2004 Vendor.java    ./songsifter/java/com/transpose/songdeed/jobs:-   639 May 20, 2004 processbids.jsp-   497 May 20, 2004 processsongratings.jsp    ./songsifter/java/com/transpose/songdeed/servlets:-   7637 May 20, 2004 AlbumBidServlet.java-   841 May 20, 2004 AppInit.java-   4995 May 20, 2004 AuctionServlet.java-   5335 May 20, 2004 DeedRatingServlet.java-   5598 May 20, 2004 EditSongDeedServlet.java-   28595 May 20, 2004 FanServlet.java-   3695 May 20, 2004 GeneralCommentServlet.java-   7034 May 20, 2004 LoginServlet.java-   254 May 20, 2004 Makefile-   2334 May 20, 2004 NewsServlet.java-   4153 May 20, 2004 PayPalServlet.java-   1148 May 20, 2004 RPC2.java-   3067 May 20, 2004 SongDeedCommentServlet.java-   925 May 20, 2004 SongModeratorCommentServlet.java-   629 May 20, 2004 SongTopicBidDollarsServlet.java-   626 May 20, 2004 SongTopicBidPointsServlet.java-   5988 May 20, 2004 SongTopicBidServlet.java-   3100 May 20, 2004 SongTopicCommentServlet.java-   778 May 20, 2004 SpendMyPointsServlet.java-   1590 May 20, 2004 StoreSongDeedServlet.java-   4572 May 20, 2004 StressTestServlet.java    ./songsifter/java/com/transpose/songdeed/test:-   866 May 20, 2004 addToFanGenreList.jsp-   1102 May 20, 2004 reloadblog.jsp-   832 May 20, 2004 testannouncement.jsp-   634 May 20, 2004 testartistwebsite.jsp-   469 May 20, 2004 testblog.jsp-   847 May 20, 2004 testdeedfanlist.jsp-   2043 May 20, 2004 testerror.jsp-   2134 May 20, 2004 testfanoption.jsp-   979 May 20, 2004 testfanpoints.jsp-   562 May 20, 2004 testgenres.jsp-   829 May 20, 2004 testgetlink.jsp-   3703 May 20, 2004 testpoints.jsp-   792 May 20, 2004 testpromotedtopiclist.jsp-   1636 May 20, 2004 testsearch.jsp-   696 May 20, 2004 testshowsongdeed.jsp-   3179 May 20, 2004 testsongdeed.jsp-   724 May 20, 2004 testsongdeed_fan.jsp-   911 May 20, 2004 testsongdeedhistoryvector.jsp-   1534 May 20, 2004 testsongdeedlist.jsp-   4594 May 20, 2004 testsongdeedrating.jsp-   681 May 20, 2004 testsongdeedvalue.jsp-   755 May 20, 2004 testsongfanvalue.jsp-   997 May 20, 2004 testsongtopicbidj sp-   1105 May 20, 2004 testsongtopicbidlist.jsp-   1131 May 20, 2004 testsongtopicbidpointslist.jsp-   1827 May 20, 2004 testsongtopiccomment.jsp-   902 May 20, 2004 testsongtopiccommentdate.jsp-   1366 May 20, 2004 testsongtopiccommentlist.jsp-   1063 May 20, 2004 testsongtopicexists.jsp    ./songsifter/java/com/transpose/tags:-   803 Jul. 28, 2004 DisplayAIM.java-   7129 Jul. 28, 2004 DisplayDeedHistory.java-   1763 Jul. 28, 2004 DisplayGenreCheckboxList.java-   1349 Jul. 28, 2004 DisplayGenreCheckboxListLoggedIn.java-   994 Jul. 28, 2004 DisplayGenreDropDown.java-   806 Jul. 28, 2004 DisplayICQ.java-   1633 Jul. 28, 2004 DisplayLatestDetailedNews.java-   1179 Jul. 28, 2004 DisplayLatestNews.java-   3351 Jul. 28, 2004 DisplayListNavigation.java-   3912 Jul. 28, 2004 DisplayPlainMusicLinks.java-   1217 Jul. 28, 2004 DisplayPresetGenreDropDown.java-   375 Jul. 28, 2004 DisplaySongBestDeedList.java-   421 Jul. 28, 2004 DisplaySongChangedBestDeedList.java-   413 Jul. 28, 2004 DisplaySongChangedDeedList.java-   16634 Jul. 28, 2004 DisplaySongDeedList.java-   893 Jul. 28, 2004 DisplaySongFanDeedList.java-   3371 Jul. 28, 2004 DisplaySongLinks.java-   907 Jul. 28, 2004 DisplaySongNeedyDeedList.java-   1385 Jul. 28, 2004 DisplaySongSearchBestDeedList.java-   2945 Jul. 28, 2004 DisplayTopScorers.java-   2857 Jul. 28, 2004 DisplayTopScorersToday.java-   1366 Jul. 28, 2004 DisplayTopScouts.java-   1370 Jul. 28, 2004 DisplayTopWriters.java-   979 Jul. 28, 2004 FairtunesSearchURL.java-   239 Jul. 28, 2004 Makefile-   921 Jul. 28, 2004 Picture.java-   1319 Jul. 28, 2004 VendorList.java-   1601 Jul. 28, 2004 VendorSearchURL.java    ./songsifter/java/com/transpose/tags/test:-   473 Jul. 28, 2004 testtopscorers.jsp    ./songsifter/java/com/transpose/util:-   1531 May 20, 2004 Assertjava-   5363 May 20, 2004 BreadCrumbs.java-   1302 May 20, 2004 CookieUtils.java-   722 May 20, 2004 DBConfig.java-   3000 May 20, 2004 DBConnectionHelper.java-   1624 May 20, 2004 DBQueryHelper.java-   2009 May 20, 2004 DBUpdateHelper.java-   2393 May 20, 2004 DateUtils.java-   7010 May 20, 2004 DocumentObject.java-   2134 May 20, 2004 Dumper.java-   1267 May 20, 2004 DynamicPagedList.java-   11297 May 20, 2004 ElementObject.java-   4450 May 20, 2004 ErrorNotifier.java-   4059 May 20, 2004 HashUtilities.java-   2465 May 20, 2004 ID.java-   321 May 20, 2004 KeyNotFoundException.java-   5937 Sep. 8, 2004 KeyedStoreRecord.java-   555 May 20, 2004 LoggedException.java-   1026 May 20, 2004 Mailer.java-   247 May 20, 2004 Makefile-   13714 May 20, 2004 Normalize.java-   1448 May 20, 2004 PagedList.java-   8154 May 20, 2004 PreparedStatementHelper.java-   2519 May 20, 2004 RSSDocument.java-   1017 May 20, 2004 RSSEnclosure.java-   1844 May 20, 2004 RSSItem.java-   2237 May 20, 2004 RadioBlogger.java-   1069 May 20, 2004 RandomString.java-   5422 May 20, 2004 Rating.java-   6068 May 20, 2004 ResultSetHelper.java-   982 May 20, 2004 SQLFormat.java-   458 May 20, 2004 Singleton.java-   3166 May 20, 2004 SingletonStoreRecord.java-   3672 May 20, 2004 SongHash.java-   23556 Sep. 8, 2004 StoreRecord.java-   656 May 20, 2004 StringDumper.java-   5193 May 20, 2004 StringFormat.java-   900 May 20, 2004 TestURL.java-   3474 May 20, 2004 TransactionConnection.java-   621 May 20, 2004 WaitThread.java-   657 May 20, 2004 XMLParsingException.java-   1671 May 20, 2004 XercesErrorHandler.java-   6628 May 20, 2004 XmlWriter.java    ./songsifter/java/com/transpose/util/servlets:-   248 May 20, 2004 Makefile    ./songsifter/jsps:-   4215 May 20, 2004 about.jsp-   4976 May 20, 2004 aboutartists.jsp-   2419 May 20, 2004 aboutauctions.jsp-   7213 May 20, 2004 aboutcriteria.jsp-   2982 May 20, 2004 abouthosting.jsp-   5330 May 20, 2004 aboutnewmusic.jsp-   3313 May 20, 2004 aboutpoints.jsp-   3747 May 20, 2004 aboutpredict.jsp-   2874 May 20, 2004 aboutrecommend.jsp-   4670 May 20, 2004 aboutreviews.jsp-   3798 May 20, 2004 aboutsponsor.jsp-   4074 May 20, 2004 aboutthecompetition.jsp-   930 May 20, 2004 addtomailinglist.jsp-   1082 May 20, 2004 admin.jsp-   2407 May 20, 2004 allbuckssponsors.jsp-   1515 May 20, 2004 allpointssponsors.jsp-   2407 May 20, 2004 allsponsors.jsp-   4960 May 20, 2004 artistalreadyloggedin.jsp-   2671 May 20, 2004 artistlist.jsp-   4073 May 20, 2004 audiohelp.jsp-   3893 May 20, 2004 badge.jsp-   448 May 20, 2004 badge_bestrecs.jsp-   3196 May 20, 2004 badgedata_bestrecs.jsp-   4303 May 20, 2004 badges.jsp-   895 May 20, 2004 badgestyle.css-   8285 May 20, 2004 best.jsp-   7323 May 20, 2004 changed.jsp-   4193 May 20, 2004 changedbest.jsp-   1717 May 20, 2004 changegenres.jsp-   9458 May 20, 2004 confirmalbumbid.jsp-   1947 Aug. 6, 2004 contact.jsp-   1704 May 20, 2004 copyright.jsp-   8752 May 20, 2004 create.jsp-   6597 May 20, 2004 createaccount.jsp-   6632 May 20, 2004 createartistaccount.jsp-   10453 May 20, 2004 createartistrec.jsp-   3887 May 20, 2004 createartistrecthanks.jsp-   2765 May 20, 2004 createbid.jsp-   2294 May 20, 2004 deedstats.jsp-   8086 May 20, 2004 discussion.jsp-   8717 May 20, 2004 edit.jsp-   8738 May 20, 2004 editartistrec.jsp-   3741 May 20, 2004 emailafriend.jsp-   2375 May 20, 2004 error.jsp-   1309 May 20, 2004 fanheader.jsp-   3402 May 20, 2004 fanlist.jsp-   25449 May 20, 2004 faq.jsp-   2028 May 20, 2004 friends.jsp-   5880 May 20, 2004 gettingstarted.jsp-   3311 May 20, 2004 help.jsp-   501 May 20, 2004 help_artistweeklyemail.jsp-   543 May 20, 2004 help_asterisks.jsp-   416 May 20, 2004 help_beta.jsp-   416 May 20, 2004 help_mailinglist.jsp-   647 May 20, 2004 help_musiclist.jsp-   491 May 20, 2004 help_mypoints.jsp-   621 May 20, 2004 help_myprivate.jsp-   537 May 20, 2004 help_mypublic.jsp-   486 May 20, 2004 help_mysite.jsp-   576 May 20, 2004 help_sponsoreddollars.jsp-   592 May 20, 2004 help_sponsoredpoints.jsp-   535 May 20, 2004 help_topscorers.jsp-   670 May 20, 2004 help_toratelist.jsp-   317 May 20, 2004 helppopupend.jsp-   585 May 20, 2004 helppopupheader.jsp-   519 May 20, 2004 helppopupstart.jsp-   4622 May 20, 2004 index.jsp-   1564 May 20, 2004 l.jsp-   3116 May 20, 2004 lastloginlist.jsp-   2141 May 20, 2004 lastmusiccomments.jsp-   2175 May 20, 2004 lastratingnotes.jsp-   2186 May 20, 2004 lastrecommendationcomments.jsp-   6545 May 20, 2004 login.jsp-   304 May 20, 2004 logout.jsp-   3190 May 20, 2004 mailpassword.jsp-   6127 May 20, 2004 memberprofile.jsp-   7522 May 20, 2004 music.jsp-   9332 May 20, 2004 musiccomments.jsp-   9065 May 20, 2004 musicdiscussion.jsp-   5787 May 20, 2004 mypoints.jsp-   17942 May 20, 2004 mysettings.jsp-   12072 May 20, 2004 needrating.jsp-   2103 May 20, 2004 newartistintro.jsp-   4002 May 20, 2004 newmusiclinks.jsp-   4355 May 20, 2004 newsletter-1-1.jsp-   6165 May 20, 2004 newsletter-1-2.jsp-   116 May 20, 2004 openLetter.jsp-   552 May 20, 2004 paypalfail.jsp-   546 May 20, 2004 paypalsuccess.jsp-   23982 May 20, 2004 positiverecommendation.jsp-   7270 May 20, 2004 preview.jsp-   6849 May 20, 2004 previewartistrec.jsp-   2031 May 20, 2004 privacy.jsp-   3528 May 20, 2004 quickstart.jsp-   9052 May 20, 2004 ratingnotes.jsp-   8962 May 20, 2004 recommendationcomments.jsp-   88 May 20, 2004 robots.txt-   3058 May 20, 2004 rssfeed.jsp-   2727 May 20, 2004 rssfeedsexplained.jsp-   3598 May 20, 2004 rulesforgoodreviews.jsp-   7523 May 20, 2004 searchresults.jsp-   886 May 20, 2004 showbadge.jsp-   428 May 20, 2004 siteoffline.jsp-   9737 May 20, 2004 spendmypoints.jsp-   4335 May 20, 2004 sponsorasong.jsp-   3682 May 20, 2004 sponsoredmusicbucks.jsp-   3764 May 20, 2004 sponsoredmusicpoints.jsp-   4610 May 20, 2004 sponsorwithbucks.jsp-   6706 May 20, 2004 sponsorwithpoints.jsp-   2650 May 20, 2004 startdiscussion.jsp-   7417 May 20, 2004 stats.jsp-   789 May 20, 2004 stresstesting.jsp-   5667 May 20, 2004 style.css-   3201 May 20, 2004 template.jsp-   3280 May 20, 2004 testvalidity.jsp-   3537 May 20, 2004 topmonthlyscorerslist.jsp-   814 May 20, 2004 topreviewwriters.jsp-   3140 May 20, 2004 topscorerslist.jsp-   804 May 20, 2004 topscouts.jsp-   4260 May 20, 2004 tos.jsp-   5865 May 20, 2004 updateemailsettings.jsp-   8251 May 20, 2004 updatememberprofile.jsp-   5404 May 20, 2004 updatepublicprofile.jsp-   5836 May 20, 2004 updatesitesettings.jsp-   906 May 20, 2004 values.jsp-   2259 May 20, 2004 verify.jsp-   1408 May 20, 2004 verify_failed.jsp-   738 May 20, 2004 verif_mailed.jsp-   27188 May 20, 2004 view.jsp-   7399 May 20, 2004 viewalbumauctionitem.jsp-   3393 May 20, 2004 viewdiscussion.jsp-   3562 May 20, 2004 viewforum.jsp-   254 May 20, 2004 viewreview.jsp-   858 May 20, 2004 waitforverify.jsp-   65 May 20, 2004 weblog.jsp-   2255 May 20, 2004 whyrate.jsp    ./songsifter/jsps/includes:-   1107 May 20, 2004 announcement.jsp-   640 May 20, 2004 autologin.jsp-   440 May 20, 2004 beginbody.jsp-   229 May 20, 2004 endbody.jsp-   2523 May 20, 2004 footer.jsp-   9388 May 20, 2004 header.jsp-   3646 May 20, 2004 jspheader.jsp-   494 Sep. 8, 2004 notice.jsp-   785 May 20, 2004 retrievepoints.jsp-   670 May 20, 2004 setuppaging.jsp-   2151 May 20, 2004 sidebarauctions.jsp-   442 May 20, 2004 sidebarbadge.jsp-   974 May 20, 2004 sidebardiscuss.jsp-   2228 May 20, 2004 sidebarmailinglist.jsp-   1945 May 20, 2004 sidebarmypoints.jsp-   3424 May 20, 2004 sidebarsponsoredmusic.jsp-   2458 May 20, 2004 sidebartopdailyscorers.jsp-   926 May 20, 2004 sidebartopscorers.jsp-   845 May 20, 2004 songdeedlistheader.jsp

TECHNICAL FIELD

The present invention is in the fields of collaborative filtering andonline community, typically as implemented on networks of communicatingcomputers.

BACKGROUND ART

Collaborative filtering systems are well known, as are online communitysystems. Examples of the former include Amazon.com's recommendationtechnology and other similar systems such as eMusic.com's. Examples ofthe latter include Google Groups.

However, none of the existing solutions effectively leverages the factthat users of online recommendations systems and online communitysystems typically own their own computers, and have the opportunity tomake the central processing units of those computers available formaking such systems more useful and enjoyable.

In particular, the task of matching people with extremely similar tastesand interests becomes very computationally difficult as the number ofpeople increases and as the complexity of the similarity measureincreases. With hundreds of thousands or even millions of people such asare typically enrolled in major online services, limitations of serverhardware resources constrain the system's ability to find the bestmatches between people based on taste and interest.

To the degree that such matches are made with real accuracy,“neighborhoods” of individuals with extremely similar interests may beformed that can be used for purposes of recommendation and community.

What is needed, then, is an effective way of leveraging the computersowned by end-users of a community and recommendation system for thepurpose massively-distributed similarity searching.

SUMMARY OF THE INVENTION

The present invention puts the computer used by a particular end-user(the ‘client computer’ or ‘client machine’) to work in finding his orher best matches, thus offloading that computational load from theserver. (In some variants, some users' computers may do that work for amanageable number of other users; for purposes of example this summarywill not discuss those details.)

To enable the computations to occur in the client machines, thenecessary data needs to be transported there. This data consists, atleast in part, of ‘profiles’ of various users. Various embodiments dothis in different ways, the common denominator being that profiles thatare relatively likely to be matches to the user for whom neighbors arebeing sought arrive first.

Then the client computer conducts a substantially (or completely)exhaustive search of that available data for the very best matches.

Typically at least part of the profile data performs a dual purpose.First it is used for similarity calculations. Second, it is used fordisplay purposes, so that a user can view taste information pertainingto his neighbors. For instance, in a typical music application, thiswill include song title and artist information for songs in theneighbors' collections.

This disclosure will make use of a detailed listing of key aspects,followed by a glossary containing definitions for terms used therein.

ASPECT 1. A networked computer system for supplying recommendations andtaste-based community to a target user, comprising:

networked means for providing representations of nearest neighborcandidate taste profiles and associated user identifiers in an ordersuch that said nearest neighbor candidate taste profiles tend to be atleast as similar to a taste profile of the target user according to apredetermined similarity metric as are subsequently retrieved ones ofsaid nearest neighbor candidate taste profiles,

means to receive said representations of nearest neighbor candidatetaste profiles and associated user identifiers on at least oneneighbor-finding user node,

said neighbor-finding user nodes each having at least one similaritymetric calculator calculating said predetermined similarity metric,

at least one selector residing on at least one of said neighbor-findinguser nodes using the output of said at least one similarity metriccalculator for building a list representing the nearest-neighbor users,

said list representing said nearest-neighbor users providing access toassociated ones of said candidate profiles,

a nearest-neighbor based recommender which uses said associated ones ofsaid candidate profiles to recommend items,

a display for viewing identifiers of recommended items,

a display for viewing identifiers of a plurality of nearest neighborusers,

means to select at least one of said nearest neighbor users from saiddisplay of identifiers of a plurality of nearest neighbor users,

a display of information relating to at least one of the items in saidnearest neighbor user's collection,

whereby massively distributed processing is harnessed in abandwidth-conserving way for finding the best neighbors out of theentire population of users, and the same neighborhood is leveraged toprovide recommendations as well as highly focused taste-based communityfor sharing the enjoyment of items including recommended items

ASPECT 2: The networked computer system of ASPECT 1, further includingmeans to facilitate communication with at least said nearest neighborusers where the type of communication comprises at least one selectedfrom the group consisting of online chat, email, online discussionboards, voice, and video.

ASPECT 3: A networked computer system for supplying recommendations andtaste-based community to a target user, comprising

an ordered plurality of nearest neighbor candidate taste profiles andassociated user identifiers such that said nearest neighbor candidatetaste profiles tend to be at least as similar to a taste profile of thetarget user according to a predetermined similarity metric as aresubsequently positioned ones of said nearest neighbor candidate tasteprofiles,

networked means to receive said nearest neighbor candidate tasteprofiles and associated user identifiers on at least oneneighbor-finding user node,

said neighbor-finding user nodes each having at least one similaritymetric calculator calculating said predetermined similarity metric,

at least one selector residing on at least one of said neighbor-findinguser nodes using the output of said at least one similarity metriccalculator for building a list representing the nearest-neighbor users,

said list representing said nearest-neighbor users providing access toassociated ones of said candidate profiles,

a nearest-neighbor based recommender which uses said associated ones ofsaid a nearest-neighbor based recommender which uses said associatedones of said candidate profiles to recommend items,

a display for viewing identifiers of recommended items,

a display for viewing identifiers of a plurality of nearest neighborusers,

means to select at least one of said nearest neighbor users from saiddisplay of identifiers of a plurality of nearest neighbor users,

a display of information relating to at least one of the items in saidnearest neighbor user's collection,

whereby massively distributed processing is harnessed in abandwidth-conserving way for finding the best neighbors out of theentire population of users, and the same neighborhood is leveraged toprovide recommendations as well as highly focused taste-based communityfor sharing the enjoyment of items including recommended items

ASPECT 4: The networked computer system ASPECT 1, further including asingle downloadable file that contains software that executes allnecessary non-server computer instructions.

GLOSSARY

REPRESENTATION: In the above discussion of “aspects,” representationsmay be the user profiles themselves (including the taste profiles), orjust the taste profiles (which should include an identifier of theuser)—or they may be user ID's of the users, or URL's enabling the datato be located on the network, or any other data that allows tasteprofiles and associated user ID's to be accessed. These are allfunctionally equivalent from the standpoint of the invention.

TASTE PROFILE: This term refers to data representing an individual'stastes or interests. It can take many forms. It may be the XML filegenerated by Apple's iTunes application which contains a list of musicfiles in the user's collection as well as how many times he has playedeach one, and other related information. This is a fairly completeprofile, having the disadvantage that it tends to consume a fairly largenumber of bytes that thus take significant bandwidth to download.

Other profile types include simple lists of song identifiers or album orartist identifiers, or various combinations thereof. In non-musicdomains, other examples include book ISBN's, or author names, orcombinations thereof; or weblog URL's, or weblog posting identifiers, orcombinations thereof; of any of a multitude of other represenations of auser's tastes and/or interests.

Just as different profile types may contain various different types ofdata, there are many formats that can be used for representing such datato be processed by a computer. XML is one, but such specifications asCORBA and many others provide ways that data objects can be representedand transported across a network, and in general such formats as vectorsor other binary or text-based formats can be used.

A taste profile is data that represents a user's tastes and/orinterests. The format and contents are particular to particularembodiments, and it must not be construed that the present invention islimited in scope to particular contents or formats as long as the datacomprises a user's tastes and/or interests or some useful summarythereof.

Further, it should be noted that a user may have a plurality of tasteprofiles. For instance, a user may have one type of music he likes tolisten to while studying, and another type he likes to listen to whiledancing. Preferred embodiments of the invention allow the user to choosedifferent taste profiles—and correspondingly different nearest neighborsand recommendations—according to mood.

Still further, note that taste profiles may be either manually orpassively generated. For instance the iTunes application captures useractivity in the course of playing music, and stores it to its associatedXML file. The user does not have to make any separate effort to cause ataste profile to be generated based upon that data. On the other hand,taste profiles can be manually generated by manually supplying ratingsto items such as songs, movies, or artists. A playlist—a list of songs auser likes to play together, and which has usually been generatedmanually—can be considered in some embodiments to be a taste profile.Some embodiments use taste profiles that incorporate a combination ofpassively and actively collected data. For instance, a profile mayinclude manually-generated ratings of songs, as well as the number oftimes each song has been played.

Finally, note that taste profiles do not necessarily include datadirectly entered by the user; they can instead be a computer-derivedrepresentation. For instance, in embodiments which associate informationsuch as genre or tempo for songs, software developers of ordinary skillswill be able to see how to summarize data for songs the user has in hashis collection to create a profile showing which genres or tempos theuser likes most; that information may then comprise the user's tasteprofile. Or, in certain embodiments with numeric values for attributes,the log of the values may be used.

TARGET USER: The aspect discussion describes the invention in a way thatfocuses on serving a particular user, who we call the “target user.”There are a plurality of users who could be considered to be targetusers, but for descriptive purposes we focus on one such user.

USER PROFILE: A user profile contains information related to theindividual such as his name, contact information, and biographical text.It also contains his taste profile. An embodiment may make all, some, ornone of this information publicly available.

SIMILARITY METRIC: Degrees of similarity are computed according to asimilarity metric, which is not necessarily a “metric” in the formalsense of a “metric space” as that term is used in mathematicalliterature (for instance http://en.wikipedia.org/wiki/Metric.space). Avery great variety of similarity metrics are available. There isnecessarily a correspondence between the nature of the similarity metricand the taste profile, because similarity metrics often requireparticular types of data.

For instance, if ratings data is present where numerical values aregiven such as on a scale from 1 to 7 where 1 is poor and 7 is excellent,such simple methods can be used as computing average difference betweenthe ratings of the items which have ratings in both taste profiles.Other techniques include computing a Euclidean distance, Mahalanobisdistance, cosine similarity, or Pearson's r correlation using that data[13, 15]. Another approach is given in [16], beginning column 20, line59. Any other computation that results in a metric that tends to beindicative of similarities of taste between the two users can be used.

In many embodiments data is massaged to make it more appropriate usewith certain popular similarity metrics. For instance, in a musicapplication when song play counts are included in the taste profile, thesongs may be ranked in order of frequency of play; songs in the topseventh have an “implied rating” of 7, songs in the next seventh have animplied rating of 6, etc. This data can then be used with similaritymetrics such as those mentioned above.

Note that some similarity metrics, such as Pearson's r, enable thecomputation of levels of probabilistic certainty, or p-values, withrespect to a null hypothesis. In many cases, such as r, it is possibleto state a null hypothesis that roughly corresponds to the concept “thetwo users have no particular tendency to agree.” This enables the systemto take into account the fact that some pairs of users have more data tobase the metric on then others, and thus more reason to have confidence.This is a significant advantage over many of the simpler techniques.However, this approach nevertheless has a drawback. As an exampleconsider two users with a very large number of items in common whichthey have each rated, where a p-value derived from r is used as themetric. Suppose further that on average, there is a slight tendency toagree rather than disagree. Then, simply due to the large number ofitems with ratings in common, the p-value may be extremely indicative ofrejection of the null hypothesis, even though on average, there isn't avery unusual amount of agreement between ratings. In practical use witha large number of users, where not too many nearest neighbors need to befound, this effect is normally not a major problem, because there willalso be users who do have a lot of agreement and who also have a highnumber of rated items in common, and such pairings will result in evengreater extremities of p-values. In such cases, there can be a lot ofconfidence that the similarity metric is finding users who are actuallyvery similar in taste—even though their may be other pairings, with evenmore similarity, that are left behind due to not having as much data forcomparison.

The immediately preceding paragraphs focus on situations where degreesof agreement can be discerned for each item. Another type of profileinvolves presence/absence data—where all that is known about each itemis whether a user is associated with it or not—for instance whether auser has a particular song in his collection or not. In such cases, suchcalculations as the well-known Jaccard's Index, Sorensen's Quotient ofSimilarity, or Mountford's Index of Similarity can be useful.

Some embodiments combine different similarity metrics. For instance rcan be used to compute a degree of similarity in ratings of items thatare in common between two users, and Jaccard's Index to compute thedegree of similarity implied by the numbers of items that are and arenot in common between the users. An average or geometric mean (weightedor not) may be used to combine the metrics into one that incorporatesboth kinds of information; other techniques such as p-value combiningwith respect to a null hopothesis ([16]) can be sued as well, byconverting the metrics into p-values.

Source code described in the file tasteprofileclass.py in Appendix 4 andincluded in the computer program listingappendix submitted on CDpursuant to 37 C.F.R. 1.96 takes a different approach for computingsimilarity based on iTunes' XML file. Consider a “shared song” to be asong that is in the collection of both users. This method calculates anapproximate probability that the next shared song to come into existencewill be the next song played. That is, if user A takes a recommendationfrom B's collection, it will be a song that A doesn't have yet. When hehas it, it will be another shared song. What is the probability that itwill be the next song played, once it is in A's collection? This is aparticularly appropriate similarity measure, because it measuressimilarity of tastes in a way that directly relates to a key purpose offinding nearest neighbors: making recommendations that the user willwant to play frequently. Details of the algorithm appear in the sourcecode. That algorithm is the currently preferred similarity metric.

The only requirement of the similarity metric is that, for a significantportion of pairs users which includes those who tend to be the mostsimilar in taste, the following applies: if the calculated similarity oftwo taste profiles A and B is greater than the calculated similarity oftwo taste profiles A and C, then it is likelier than not that users Aand B are actually more similar in relevant tastes than are users A andC. This likelihood will be greater for similarity metrics that will beassociated with the highest-performing embodiments of the invention. Forinstance, simply using the average distance between ratings may beacceptable for some applications, but using Euclidean distance is betterthan a simple average.

There are many ways to calculate similarity. Other than the requirementabove, the invention has no dependence on the particular similaritymetric that may be chosen by a particular embodiment. The invention mustnot be construed to be limited to a particular similarity metric or typeof similarity metric; the ones listed here are for reasons of exampleonly. Similarity metrics are interchangeable for purposes of theinvention.

MEANS FOR FACILITATING RETRIEVAL OF REPRESENTATIONS: There are a varietyof ways to provide the functionality needed. It must be stressed thatall provide identical or equivalent functionality for the purposes ofthe invention. While there are several basic structures available, thereare many variants for each that are only insubstantially different andshould not be construed as different in a way that would make them falloutside the scope of the invention.

What is needed is a means for facilitating retrieval of representationsof nearest neighbor candidate taste profiles and associated useridentifiers in an order such that said nearest neighbor candidate tasteprofiles tend to be at least as similar to a taste profile of the targetuser according to a predetermined similarity metric as are subsequentlyretrieved ones of said nearest neighbor candidate taste profiles.

The representations mentioned in the previous paragraph may be the userprofiles themselves (including the taste profiles), or just the tasteprofiles (which should include an identifier of the user)—or they may beuser ID's of the users, or URL's enabling the data to be located on thenetwork, or any other data that allows taste profiles and associateduser ID's to be accessed. These are all functionally equivalent from thestandpoint of the invention.

It is important to note that the means for facilitating this retrievaldoes not need to make use of the predetermined similarity metric or acalculator that can calculate it. In particular, it isn't required thatthe retrieval of representations is exactly in the same order that wouldbe given by the similarity metric.

One implication of this is that even if the similarity metric is not ametric in the sense of a metric space, a metric space-based metric canbe used in the means for facilitating this retrieval. This makesavailable a large number of algorithms in the literature forfacilitating the retrieval.

In preferred embodiments the data used in facilitating this retrieval isa subset of the data used in the similarity metric, or a summary derivedfrom that data, or a combination of the two, in order to lowercomputational costs.

1) Pre-Existing Data Structures

Data structures may be created that provide the foundation for retrievalin the necessary order or sequence. For instance, clustering may be doneusing a variety of methods. See, for example, [1] and [2] which apply to“metric spaces,” that is, a structure involving a distance functionwhere the function used to compute the distance between any two objectssatisfies the positivity, symmetry, and triangle inequality postulates.Such a distance function can be a similarity metric; examples includeEuclidean distance.

See also [3] which works on large binary data sets where data pointshave high dimensionality and most of their coordinates are zero. Forinstance this can be used to cluster based upon attributes consisting ofindicators of whether or not a user has a particular song in hiscollection. See also [4].

Appendix 4 describes source code (genrerankhandler.py), which appears onthe computer program listing appendix, and which contains an algorithmwhich uses genre data (genrerankhandler.py), but a practitioner ofordinary skill in the art will see how to modify it for use with otherkinds of data which is of limited dimensionality.

For a given clustering scheme, practitioners of ordinary skill in theart will know how to compare a particular taste profile to a particularcluster of taste profiles, and thus determine an affinity between eachcluster and the taste profile.

Then, the cluster with the most computed affinity to the given tasteprofile is first in the retrieval order, the cluster with the next mostcomputed affinity is the returned next, etc. Of course, there can besome degree of difference from this strict order without violating thespirit of the invention or moving outside its scope. When we discussretrieving a cluster, we mean either a set of representations of nearestneighbor candidate user profiles, or a representation of suchrepresentations. For instance such a representation can be the name orInternet address of a file containing the representations of candidates.

Another approach which uses clustering is given in [5].

Clusters are not the only kind of structure that can be used. See, forexample, [6] and [4]. Practitioners of ordinary skill will see how touse such structures for retrieving in an order consistent with the needsof the invention. Many such structures with different details ofimplementation, but these details are not substantial differences forthe purposes of the invention. It is not possible to list all possiblecombinations of such details, and it must not be construed that one canmove outside of the scope of the invention merely by finding suchvariations on the structures listed here, which it cannot be stressedenough are listed for reasons of example only.

The source code in Appendix A provides the exemplary key aspects of oneparticular method for causing the representations to be retrieved anorder consistent with the needs of the invention. See the explanatorytext in the section for clusterfitterclass.py.

Of course preferred embodiments update or replace these structures overtime as taste profiles associated with users change, and users are addedto or removed from the database associated with the embodiment.

Note further that the data structure may be built and stored on acentral server, on machines owned by end-users of the invention whichcommunicate their results directly to a server and/or to other end-usermachines via peer-to-peer means, or on a combination. It must not beconstrued that a system falls outside of the scope of the inventionmerely because the necessary computational and storage resources for thefoundation for retrieval are provided at one location or set oflocations rather than another, or one type of network node rather thananother.

As one example of a combined approach, consider [7]. That paper providesan algorithm to do clustering based on nearest neighbors. It can beleveraged to produce a combined approach as follows.

Use a peer-to-peer system such as the Gnutella protocol or any otherprotocol that enables one to search for a file. Each end-user machine isa node in such a network, also known as a “cloud.”

Each end-user machine then conducts a search for each file, or asubstantial subset, of files that are already in that machine'scollection, using the words in the name of each fie (or a substantialsubset of them). A “hit” occurs when the protocol returns an identifierof a node that has a file with matching words in its name.

Some searches will get more “hits” than others.

For purposes of the algorithm in [7], “nearest neighbors” will have adifferent definition than the one involving the predetermined similaritymetric of the present invention. It involves a couple of components.

The first component is “hit-nearness.” Suppose a query returns only 1hit. That means that the node identified by that hit is considered to bein the first tier of hit-nearness. If it returns 2 hits, each of thenodes are considered to be in the second tier of hit-nearness. And soon. The tiers are ranked, and the ranks are divided by the number oftiers. If T is the number of tiers, the best hit-nearness is 1/T, thenext best is 2/T, and the worst is T/T (1).

The next component is “quantity-nearness”. We count the number of timesa particular node's identifier is retrieved in the process of seachingfor files. We create tiers based on those numbers using the same tieredapproach as for hit-nearness, and again resulting in a number between 0and 1 where the worst node—the node with the smallest number of hits—hasa quantity-nearness, Q, of 1.

Then the distance of a node to the node doing the search is the squareroot of T * Q. So the ordering of each node's the neighbors for thealgorithm in [7] is laid out that way.

The work of finding neighbors for [7] is thus carried out on theend-user machines. Then, that nearest neighbor information is uploadedto the server from each node, and the algorithm in [7] is carried outthere.

For instance, the algorithm could include Gnutella protocol code, anduse the procedure described above to cluster similar taste profilestogether, where similarity is determined by having more neighbors incommon (rather than by our predetermined similarity metric).

Then to determine the order in which clusters should be downloaded to aparticular user's node, the one that contains the greatest number of hisneighbors should be downloaded first, then the one that has the nextgreatest number of his neighbors, etc.

2) Dynamic Searches for Neighbor Candidates

Instead of, or in combination with, pre-existing data structures such asdescribed above, many embodiments use dynamic searches.

Probably the simplest example of this is a server-based system with atable of attributes culled from the taste profiles, one row per user. Inone embodiment these attributes are bits representing the presence orabsence of particular genres. So, if there are 100 defined genres, eachrow has 100 bits.

Then to determine the order in which taste profiles should bedownloaded, the server simply checks each row and counts the proportionof matching genres to total genres in the other user's taste profile.The representations of taste profiles with the highest proportions areretrieved first. The table could be a RAM-based bitmap, a database suchas based upon SQL, or any other convenient configuration. Of course theydata used wouldn't have to be genres. It could be a selection of artistsor songs or ablums, or in non-music domains, book titles, web logs,paintings, news articles, school subjects, course numbers, etc.

In another set of embodiments, there is virtually no server-basedprocessing at all; the only server processing is to supply networkaddresses for a set of seed nodes that may be online at the time, whichmay in fact be included with the download of the software that executesthe computer steps involved in the invention.

In these embodiments, a peer-to-peer protocol such as Gnutella's is usedto conduct searches for files, as described above in this text. Notethat if a pre-existing, popular protocol such as Gnutella's is used itshould be modified so that a node can respond to a request for acomplete taste profile; if that does not include a list of all (or asubstantial subset of) items on the node's machine, then nodes shouldalso be able to respond to a request for such items.

As described elsewhere in this specification, a node (we will refer toit as the “target node”) initiates searches for files it has in itscollection. Nodes that are the subject of hits are candidate nearestneighbors. Nodes that have more files matching the target nodes filesthan others are statistically more likely to be hit before nodes with asmaller number of files. The representation that comes along with thehit is then used the taste profile and if necessary the list of files.So, that satisfies the requirement of the means for facilitatingretrieval in the desired order. No other server activity is required.

Note that to increase the performance over protocols such as Gnutellathat are popular at the current time, currently preferred embodimentsuse the peer-to-peer method described in [12]. Also, at the time thatuser machines connect for a new session in the peer to peer network,they should connect to randomly chosen seed nodes in order to increasethe randomness of results obtained from searches.

It must not be construed that the scope of the present invention islimited to the particular techniques listed here.

3) Note on Retrieval Techniques

Whether the means for facilitating retrieval is based upon apre-existing data structure or whether dynamic computations are done,there is still the question of actually delivering the representationsof nearest neighbor candidate profiles, and if separate, the profilesthemselves.

In some embodiments these come directly from the server. In others suchas peer-to-peer techniques like those described above, they may be theresult of direct communication with the machine owned by the user whoseprofile is required.

In some embodiments caching solutions such as BitTorrent [8], FreeNet[9], FreeCache [10] and Coral [11] are used to distribute therepresenations and/or the profiles. It is preferred to use BitTorrent todistribute cluster files, where the clusters contain the profiles.

4) Further note on scope. It must not be construed that the scope of theinvention is limited to the specific examples which are listed here forexplanatory purposes. The requirement is that profile representation areretrieved s in an order such that the nearest neighbor candidate tasteprofiles tend to be at least as similar to a taste profile of the targetuser according to a predetermined similarity metric as are subsequentlyretrieved ones of said nearest neighbor candidate taste profiles. Theintent is not to carry out the impossible task of listing every possibleway to achieve that. The intent is to teach a number of ways to achievethat end; other techniques that achieve that end are equivalent for ourpurposes. That is, such techniques are interchangeable in the sense thatthey will result in an embodiment of the invention that falls within thescope.

NEAREST-NEIGHBOR: A target user profile's nearest neighbors are theother user profiles whose taste profiles are closest to the target userprofiles according to the predetermined similarity metric. However inpreferred embodiments there are exceptions: users can cause entries tobe added to the nearest neighbor list that may not be ones that have themost computed similarity, and they may delete entries from the list, andthey may cause an entry to become permanent (though manuallydeleteable). They can do these actions manually or through automaticmeans such as a program that runs through ones email address book andmakes the user profiles associated with email address found therepermanenty. Such features may detract from recommendation accuracy whileadding to the user's pleasure in the nearest neighbor community.

NEAREST-NEIGHBOR BASED RECOMMENDER: Nearest-neighbor-basedrecommendation algorithms are well-known in the literature. See forexample, [13] and [14]. The source code file recommenderclass.pydescribed Appendix 4 and included the the computer program listingappendix also includes a technique.

The scope of the present invention should not be construed as limited toany particular nearest-neighbor-based recommendation algorithm. They arefundamentally interchangeable for the intents and purposes of theinvention, although some will have better accuracy than others. Thecurrently preferred technique is given in recommenderclass.py.

SERVER: The term “server” as used in this specification means one ormore networked computers, incorporating a central processing unit andtemporary storage such a RAM and also persistent storage such as harddisks. They perform central functions such as storing a central list ofusers. While there may be more than one server, they usually do not haveto be separately accessed by user-associated computers; rather theypresent a unified interface. One such example of multiple serversworking together is the case of a server computer running software thatinteracts with client software running on user-associated computers,which uses other computers for database storage and to provide databaseredundancy.

USER NODE: The computer (also referred to as the “machine”) associatedwith a human user of the computer, providing one or more input devicessuch as a keyboard and one or more output devices such as LCD screen. Itis networked, preferably through the Internet, to other user nodes. Acommon protocol such as TCP/IP is used for communication with other usernodes.

NEIGHBOR-FINDING USER NODE: In currently preferred embodiments all nodesare essentially the same, and play the role of ” neighbor-finding usernodes; but in some embodiments, certain tasks are relegated to certainof the user nodes. For instance, it may be that certain users arewilling to make their computational and bandwidth resources available toothers, and that others are less willing; for instance those who arewilling may get a price break.

In such embodiments, neighbor-finding user nodes take it upon themselvesto do work for multiple users. For purposes of neighbor-finding, theywork either independently of the user nodes they are helping or inconcert with them. For instance, they may receive the candidate nearestneighbors for other users, and use their taste profiles to compute thesimilarity according to the similarity metric, and then pass on only themost similar nearest neighbors to the user nodes across the network.

IDENTIFIERS FOR DISPLAY: Identifiers of items and nearest neighbors aredisplayed in such visual constructs on a visual computer display astables in a window or menus such as pop-up menus. Some embodiments mayuse audio means as a kind of display when visual display is notpossible. The identifiers may be identifiers used internally to keeptrack of the items and users, or they may be special public identifierssupplied by the users or item producers, or any other identifier that isthought would be convenient for the users.

NOTES

While this specification focuses on the example of music recommendationand communities, that is for purposes of example and ease of explanationonly. It applies just as completely to other domains, such as books, weblogs, web sites, movies, news, educational items, discussion groups, andothers. Embodiments in all of these domains and other domains whichcould benefit from taste-based recommendations and communities.Occasionally in this specification the word “item” is used inclusivelyto represent the various types of objects of taste or interest.

The word “taste” as used in this specification should not be construedto imply that the invention's scope is limited to artistic works. Itapplies equally well to information such as news sources. The word“interest” should be considered a synonym for “taste” for purposes ofthis specification.

Other information besides the taste profiles may be used in findingnearest neighbors. As one example, some embodiments allow the list ofnearest neighbors to be restricted to individuals who live in particularphysical localities.

The specification sometimes uses the word “machine” as an equivalent for“computer.”

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an overall flowchart illustrating an embodiment in which eachclient node is responsible for determining its own user's nearestneighbors.

FIG. 2 is a chart showing how the nearest neighbor list 110 is put touse

MODES FOR CARRYING OUT THE INVENTION

FIG. 1 illustrates an embodiment in which each client node isresponsible for determining its own user's nearest neighbors.Representations of user profiles and associated user identifiers 5 areprovided in order of likely similarity to the user. See, for example,the descriptive text for clusterfitter.py in Appendix 4, which describesa way a client node can determine the order in which to download eachone of a set of clusters. (The source code itself appears the computerprogram listing appendix.) In the preferred embodiment, these clustersare downloaded with the help of other client nodes using BitTorrent. Inthe preferred embodiment there are a limited number of clusters,retrieved by each client node in its own appropriate order. Not everycluster is retrieved by every client, because only a certain amount oftime is available to do the downloads. But on the whole, each cangenerally, in time, be found on a number of client nodes. This enables aBitTorrent tracker running on the server, together with BitTorrentclient software running on the clients, to work together to share thecommunity bandwideth to download a cluster to a client that requests it.A programmer of ordinary skill in the art will readily see how to useBitTorrent client software, publicly available in open-source form(http://bittorrent.com/) to accomplish these tasks. Note that there isalso an existing BitTorrent “trackerless” option that does not require atracker on the server, but rather distributes the tracker functionalityto the nodes, further diminishing the bandwidth load on the server.

This disclosure contains several additional sections, each designated asan Appendix, and together with the rest of the text and computer codepresented herein, forming a unified disclosure of the present invention.As one alternative way of achieving the desired ordering of profiles seethe distributed profile climbing technique described in Appendix 3.

The profiles are received at the user nodes 20 a-c. The similarity ofeach one to the local user is calculated 30 a-c. The ones that aresimilar enough 40 a-c to the current user (for instance, by being moresimilar than the least-similar current member of the nearest neighborlist) are put into the appropriate position 50 a-c in the nearestneighbor list. In preferred embodiments that position is consistent withan ordering by similarity.

In FIG. 2 the nearest neighbor list 110 is put to use. Combined with thelocal user profile 120, recommendations are generated 130 for the user(see, for example, recommenderclass.py, described in Appendix 4 andincluded on the computer program listing appendix for an example of howto accomplish that).

Interactive communications are also enabled 140. For instance, preferredembodiments display the user identifiers of nearest neighbors in a liston a computer display. An interaction means such as clicking on aparticular icon enables an email to be automatically generated addressedto the neighbor and indicating that the sender is the current user; theuser then fills in the message text and sends it.

BIBLIOGRAPHY—References listed below in this section are herebyincorporated by reference in their entireties to the fullest extentallowed by law.

-   [1] V. Ganti, R. Ramakrishnan, J. Gehrke, A. Powell, and J. French.    Clustering large datasets in arbitrary metric spaces. Technical    report, University of Wisconsin-Madison, 1998.    http://citeseer.ist.psu.edu/ganti99clustering.html-   [2] M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu.    Incremental clustering for mining in a data warehousing environment.    Proc. 24th Intl. Conf. on Very Large Data Bases (VLDB), 1998.-   [3] C. Ordonez, E. Omiecinski, and Norberto Ezquerra. A fast    algorithm to cluster high dimensional basket data. In IEEE ICDM    Conference, 2001. http://citeseer.ist.psu.edu/ordonez01fast.html    More-   [4] Peter Yianilos, Data structures and algorithms for nearest    neighbor search in general metric spaces. In Proceedings of the    fourth annual ACM-SIAM Symposium on Discrete algorithm, Pages    311-321, Austin, Tex., United States, 1993.-   [5] C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold. Clustering    for approximate similarity search in high-dimensional spaces. IEEE    Transactions on Knowledge and Data Engineering, 14(4):792-808,    July-August 2002-   [6] P. Ciaccia, M. Patella, F. Rabitti, and P. Zezula. Indexing    metric spaces with mtree. In Quinto Convegno Nazionale Sistemi    Evoluti per Basi di Dati, pages 67-86, Verona, Italy, 25-27 Jun.    1997.-   [7] R. A. Jarvis and E. A. Patrick. Clustering using a similarity    measure based on shared near neighbors. IEEE Transactions on    Computers, C-22(11), pages 1025-1034, November 1973-   [8] http://bittorrent.com/-   [9] http://freenet.sourceforge.net/-   [10] http://www.archive.org/web/freecache.php-   [11] http://www.scs.cs.nyu.edu/coral/-   [12] N. Sarshar, P. Boykin, V. Roychowdhury. Percolation Search in    Power Law Networks: Making Unstructured Peer-to-Peer Networks    Scalable. Fourth International Conference on Peer-to-Peer Computing,    pages 2-9, August 2004-   [13] U. Shardanand, Social Information Filtering for Music    Recommendation. MIT Master's Degree Thesis, 1994.-   [14] B Sarwar, F. Karypis, J. Konstan, J. Riedl. Recommender Systems    for Large-scale E-Commerce: Scalable Neighborhood Formation Using    Clustering. Proceedings of the Fifth International Conference on    Computer and Information Technology (ICCIT 2002), 2002.-   [15] U. Shardanand, and P. Maes. Social Information Filtering:    Algorithms for Automating “Word of Mouth” in Proceedings of CHI'95    (Denver Colo., May 1995), ACM Press, 210-217.-   [16] U.S. Pat. No. 5,884,282

APPENDIX 1

This appendix describes a number of variations which we consider to bepart of the invention.

Some embodiments of the invention use “playlist sites” or “mp3 blogs” or“music blogs” to supply profile information, rather than, or in additionto, profile information stored on a local disk such as the XML databasegenerated by Apple's iTunes product. In typical embodiments thisinformation is collected by a “screen scraping” procedure, either by aprocess or processes running on the server system, or on user nodes. Insome cases, such sites publish song information using OPML or other XMLformats such as RSS, which reduces or eliminates the need for screenscraping. For embodiments making use of this capability, profileinformation will be provided to users of the system that may representthe tastes of other individuals who are not users of the system. To alarge degree, the data associated with these individuals is treatedidentically to the data associated with users. In some aspects it willnormally not be possible to treat them identically because less datawill be available for them. The adjustments that need to be made in suchcases will be readily apparent to the software developers. Note thatsince this specification focuses primarily on users of the system, therewill be cases where the term “user” should be considered to also include“ghost users” derived from external data representing non-users.

Another source of ghost user data is services such as audioscrobblerthat make identifiers of songs currently being played by a given useravailable on the Web. One of ordinary skill in the art will immediatelysee how to monitor such a service to build up a profile, over time, ofusers whose currently-played-song is displayed.

Some embodiments provide a facility whereby simply loading a web page(and optionally giving permission for security reasons) will causesoftware to be automatically loaded into the user's machine thatprovides the necessary functionality; this avoids the separate step ofdownloading and installing application software. This can beaccomplished, for instance, by means of Java-language code called by aWeb browser.

Preferred embodiments have a “permanent neighbors” feature, as well as a“machine-generated neighbors list” feature. The machine-generatedneighbors list displays identifiers for those users that have beendetermined to be very close matches in taste or interest to the currentuser. The permanent neighbors list displays identifiers for users thathave been selected by the current user.

In preferred embodiments, user-interface techniques are provided forturning machine-generated neighbors into permanent neighbors. Typicallythis is done by a drag features where a member of the displayed list ofmachine-generated users is dragged to the displayed list of permanentneighbors. Other techniques include allowing the user to select memberof the displayed list of machine-generated users and call a menu optionto cause it to be listed as a permanent neighbor; this can be a pop-upmenu, a contextual menu, or a standard menu.

Permanent neighbors may be manually removed from the permanent neighborslist by the user; for instance, by means of a menu choice or dragoperation. Another option is a checkbox where multiple permanentneighbors can be marked for removal, accompanied by a separate button tocause the removals to happen.

In preferred embodiments, UI elements are provided to enter an email orIM address for an individual, and cause him to be emailed such that thesaid email includes a link (or other technique) for enabling easydownload of client software implementing the invention. In furtherembodiments, the other user is automatically added to the permanentneighbors list when the other individual becomes a registered user ofthe system. This may be accomplished in many ways, readily discernableto one skilled in the art; the scope of the invention should not beconstrued as being limited to the examples listed in this paragraph;they are listed for reasons of example only. For instance, as the userprofiles arrive on the client machines for determining which are nearestneighbors, they can be checked to determine whether an emailedindividual is among them. (The addresses of emailed users would bestored on the local user's machine for this purpose.) Alternatively, theclient can periodically query a database table residing on the server,to check whether the emailed user has become a registered user.

In preferred embodiments, permanent neighbors can include ghost users,where the ghost users are identified by the local user by appropriatenetwork identifiers. For instance in the case of online playlists, a URLthat identifies the playlist of the particular individual would be oneappropriate type of identifier. In further embodiments, the data forsuch neighbors is retrieved directly (across the network) by the clientnode without interaction with the server that implements the serverportion of the invention.

In preferred embodiments, users may click on the identifier for apermanent neighbor and cause information to be displayed that representsthe user's musical tastes; such as a list of artists and/or songs in theuser's collection, possibly including such elements as the number oftimes each song has been played, the date added to the collection, andothers; this list is for example only and not intended to be inclusive.Further embodiments display this data for permanent users in the sameonscreen list area that is also used for displaying the analogous dataassociated with machine generated neighbors.

In preferred embodiments where neighbors are used as the basis forgenerating recommendations, it is recognized that permanent neighborsmay or may not be the ideal individuals to generate recommendationsfrom. For instance, an individual may be made into a permanent neighborbecause he is a friend, rather than because his tastes are remarkablysimilar to those of the local user. Accordingly, in such preferredembodiments the option is provided to leave permanent neighbors out ofthe recommendation process. In some such embodiments, this is done as asingle binary choice for all permanent neighbors, for instance, using acheckbox that appears in a Preferences dialog. In others, it is done ona one-by-one basis, for instance, with checkboxes accompanying eachlisted, displayed identifier for permanent neighbors in the userinterface. In some embodiments, it is possible to make a single binarychoice to indicate that only permanent neighbors are used forrecommendations; in others there is a screen widget such as a collectionof 3 radio buttons or a standard menu which “sticky” indicators of apreviously made selection, where the user can choose between not usingpermanent neighbors in the recommendations processing, only usingpermanent neighbors, or using both.

Preferred embodiments display the most recent date and/or time that eachpermanent or machine-generated neighbor last used the system, to theextent that the client may be easily aware of that information. Forinstance, it may be included in profile information that arrives at thelocal user's node for processing of candidate neighbors; in which caseit may not be the most recent data available to the system as a whole.Alternatively it is retrieved directly from the server when it is to bedisplayed, and is thus up to date.

Preferred embodiments contain on-screen lists of neighbors (which mayinclude permanent neighbors or where permanent neighbors may be inseparate, similar lists); in further preferred embodiments these listscontain screen elements of the presence or absence of email addressesfor the users (needed because, in preferred embodiments, it is optionalto supply an email address and/or to allow other people, preferablyincluding other users, to be made aware of them). In furtherembodiments, clicking on such an element causes an email applicationopened and an automatically-addressed email to be generated, to bepopulated with content by the user. Similarly, elements indicating an IMaddress, or other communications handles, may be displayed, and UIfunctionality provided to facilitate such communications. In some suchembodiments one element is provided for each neighbor to indicate one ormore than one modes of communication as available, and clicking itcauses a menu to appear that lists them; choosing one facilitatescommunication by the chosen mode. In other embodiments, the user selectsthe list row containing the user identifier, and brings up a standardmenu to choose a mode to communicate with the selected user; whencommunication handles are not provided for a particular mode, that oneis greyed-out. A software developer of ordinary skill will readily seeother variations of how to facilitate user interaction regarding whatmodes are available and how to facilitate engaging each one. Suchvariants which contain some on-screen indicator of the availability ofcommunications with a given user are within the scope of the invention.Software developers of ordinary skill in the art will immediately seehow to implement this.

In preferred embodiments certain individuals are registered as beingartists. When an item such as a song by such an individual is displayedon screen, and if the artist has indicated that he wishes communicationswith him to be enabled, an indicator of that is provided, and UItechniques for facilitating such communications are provided; thesetechniques will generally be similar to those already discussed foruser-to-user communication. Software developers of ordinary skill in theart will immediately see how to implement this.

When artists communicate with users, preferred embodiments monitor theuniqueness of the communications, in an attempt to determine whetherartists are really communicating one-to-one with users. One way todetermine this would be to randomly sample a number of pairs ofcommunications from artists, and use “diff” text comparison techniquesto compare them. Artists with low average number of differences areconsidered by the system to not be truly engaging in one-to-onecommunications. Other techniques that enable some measure of generaluniqueness to be determined also fall with in the scope; the inventionis not dependent on any particular technique for that functionality. Invarious further embodiments, there are ramifications of being consideredto not engage in true one-to-one communications; for instance, in someembodiments, such artists are banned from being presented to users aspotential targets of communication; in others there is a displayed listof artists who appear to tend to use “canned” responses; in others thatindividual is not enabled to initiate communications with non-artistusers. In preferred versions of such embodiments an artist can denote aparticular communication as being an announcement, and it would then beexcluded from the described uniqueness checking.

Some embodiments provide UI functionality that allow the user to specifya genre or artist or other criteria for determining a subset of items,and then causing item recommendations to be selected from that subset.

Some embodiments enable recommendations to have their order at leastpartly determined by the similarity of the item to the items associatedwith some specified artist(s), item(s), or other grouping of items (suchas an album of songs).

Some embodiments provide professional-interest-matching or datingservices by examining files on the user's local computer, for instancewords in documents, and possibly words in linked URL's where the linksthemselves are stored on the user's computer, to build interestprofiles; neighbors and, in preferred such embodiments, itemrecommendations, are based on this data.

Some embodiments use a bar-code reader or other automatic means foridentifying physical objects in order to generate, or as a contributionto, the data in the user's taste profile. For instance, music CD casestypically have bar codes that can be used for that purpose. (Note that asoftware product for the Mac OS X operating system, called DeliciousLibrary, has the ability to take data supplied by a bar code reader tobuild a digital library of physical CD's and other items; however it hasnone of the other features described in this invention.)

Some embodiments add a gift suggestion feature. The individual for whomgift recommendations are to be made available makes his relevant dataavailable to the machine associated with the user who wants to give agift. For instance, an once such embodiment, an iTunes user might emailhis iTunes Music Library.xml file to the user who wants to give him agift. Other techniques for getting the relevant information to the localuser are equivalent from the standpoint of the invention. Then localprocessing occurs for that other user's data that is basically the sameas for the local user's own data. For instance, in embodiments involvingrecommendations by means of neighbors, a collection of machine-generatedneighbors is found relative to the gift recipient's data, andrecommendations are generated from that and displayed on the screen. Thevalue of this is that the local user already has the code necessary forsuch functionality, for his own recommendations; and in this case muchof that same code is re-used for purposes of gift suggestions.

Some embodiments interact with an online music store in such a way thathighly recommended music is automatically purchased at regular intervalsof time. For example, on a monthly basis, an embodiment that works withthe iTunes music store could cause the most recommended n songs where nis 1 or some greater number to be automatically purchased and downloadedto the user's machine. In preferred embodiments, the user is alertedbefore this occurs and given the choice to modify the list of songs tobe purchased; for instance, the application software might display analert dialog, the day before the purchase is to be made, which indicatesthat the top 10 songs will be purchased; input means such as checkboxesnext to the listed songs may be used to indicate that certain songsshould excluded from the automatic purchasing. Preferred embodimentsallow the user to choose the periodicity and number of songs to beautomatically purchased. In some embodiments, this process is used tocause the creation of a physical CD by a store, containing recommendedmusic (or, in other embodiments, videos, books, etc,), which issubsequently shipped to the user.

Preferred embodiments give the user control over which artists areconsidered to be part of the user's effective taste profile. Forinstance, in one embodiment the local user can view a list of theartists in his music collection; there is a checkbox next to each one,defaulting to checked; if it's unchecked, that artist is effectivelyignored in other processing based on the taste profile. In theembodiment in question, this is accomplished by means of a tuned tasteprofile and untuned taste profile; the only real use of the untuned oneis to present that list to the user for tuning by unchecking checkboxes.So, in embodiments providing control over the artists that areconsidered to be effectively part of the taste profile, where the user'slocal taste profile is used for finding nearest neigbhors, only thedesired artists are used; and where taste profiles are part of the databroadcast by the system to be viewed by other users and/or for otherusers to choose neighbors, only the desired artists are used. In someembodiments differerent sets of artists may be chosen for findingneighbors of the local user and for broadcasting, but preferredembodiments combine those features. A software developer of ordinaryskill in the art will immediately see other ways of handling the userinterface and technical issues for achieving the same purposes; theseare equivalent from the standpoint of the invention.

One embodiment involves online chat. An interest profile is built basedupon a) the words the local user types into his chat client and/or b)the words that appear in messages types by other people into the samechat room. In the case of (b) a subset may be used where the onlymessages that are at least somewhat likely to be responses to messagesfrom the local user are used—for instance by distance in time from thetime of a message sent by the current user to the chat room where thepotential response appeared. By collecting these words over time (and insome embodiments, giving words posted by other individuals less weight),a profile of chat interests can be built for each user. Then, when thesystem builds neighborhoods of similar users, those neighborhoods can beviewed as potential chat partners. In preferred embodiments a userclicks on a user identifier to start a chat session with them. In someembodiments chat rooms are automatically initiated for groups of similarusers. In chat embodiments, no other recommendations are necessary. Notethat variants of this set of embodiments use different techniques tomatch people together according to the words they type. The simplest wayis to simply treat the words used by a user as a document; thentechniques for document similarity which take word frequency intoaccount can be used. (A search on Google for “document similarity” willbring up numerous techniques.) But any technique that calculates usefulsimilarities based on the word content is equivalent for purposes of theinvention.

Some embodiments provide means to restrict candidate neighbors bycertain criteria such as physical locality. One way to do this is tosimply assign the lowest possible similarity to people who don't meetthe restriction requirements; another is to exclude them at the outsetfrom the neighbor-searching process. Techniques to do this will beimmediately apparent to a software developer of ordinary skill.

One advantage of having software running in user nodes is that certainparameters for recommendation quality can be tuned on the user node, forthe given user, by computationally expensive techniques such as geneticalgorithms. Some embodiments take advantage of this fact by usingiterative testing, genetic algorithms, simulated annealing, or otheroptimization techniques to tune parameters such as the following: thenumber of neighbors to use in recommendation calculations (assuming onlythe must similar neighbors are chosen), the optimal adventurousness (seeelsewhere in the specification for discussion of adventurousness), acutoff release date for recommended items (for instance, the user maynot be interested in old music), and others. One such other is a numberrepresenting the lowest weight to be associated with any user'sinformation; the least similar of the nearest neighbors is assigned thisweight and interpolation, with a max of 1 for the single nearestneighbor, is used to assign weights to the other neighbors accordingtheir rank or another measure. The optimization may be based on tuningthe parameters to get the best match between recommended music and themusic actually already in the user's collection. (Obviously under normalprocessing, preferred embodiments do not recommend music that the useralready has, and this screening is disabled for optimization purposes.)Preferred embodiments in the music domain try to optimize the matchbetween ranks based on song plays per day and order of recommendation.For instance, Spearman's Rank Correlation can be used to do this. Sometuning operations may change the number of recommended songs; to findthe optimal setting it may be useful to compute the p-value associatedwith each pair of rankings; the more statistically significant thep-value, the better. When rank correlation is used, preferredembodiments only consider the ranks of the top recommendations, becausewe are less interested in the exact rank of songs that are notparticularly recommended. At an extreme of this general approach, someembodiments uses Koza's Genetic Programming technique to generate atleast part of the algorithm used in the recommendation process, usingsimilar fitness criteria to the optimization measures mentioned alreadyin this paragraph.

In embodiments which carry out evolutionary computation like geneticprogramming, the invention has useful ramifications for multiprocessing.For instance a each user node evolves chromosomes (such as hierarchicalprograms in a genetic programming environment) which best suit the needsof the local user. It is likely that those same chromosomes will berelatively high-performing for other users who have the local user amongtheir neighbors. So in preferred evolutionary computation embodiments,one or more of the highest-performing genomes that has resulted from theevolutionary process on a user node becomes part of the profile, whichalso includes the taste profile. Then other user nodes that select aparticular user as a neighbor will also have his highest-performinggenome(s) available. These can be used directly; combined with thosesupplied by other neighbors by (for example) averaging therecommendation strength for each song across all genomes, or seeded intothe evolving population of genomes on that node; this is a form ofmultiprocessing evolutionary computation. It should not be construedthat the invention is limited to the example of multiprocessingevolutionary processing described here; it is an example only. Forinstance literature on genetic programming is rich with research on waysto do genetic processing in a multiprocessor environment. Those skilledin the art of genetic programming will see numerous ways to leverage thefact that each user has a neighborhood of users who will tend to bewell-served by many of the same genomes, and have user nodes that areavailable for multiprocessing to better serve the needs of all suchusers. For instance, without restricting genomes that are fed from otherusers into a local user's genome population to just the set of nearestneighbors, some embodiments give more or less probability to a foreigngenome being added commensurate with the other user's similarity to thelocal user. Many variants of taking advantage of the similarityinformation and the overall structure of the invention will occur tothose of ordinary skill in the art of genetic programming, and it mustnot be construed that such variants are not within the scope of theinvention; that is, variants are within the scope if they result inbetter performance due to the following attributes of the invention: a)the fact that the mechanism that transports taste profiles from usernode to user node (which may involve using the server as an intermediatestep) can also be used to transport genomes, either as a separate datapackage or as part of the same data package, and b) that mechanism isset up so that profiles with a higher similarity to the local user havea higher probability of arriving sooner (and, in some embodiments, atall), and those genomes are more likely than randomly-chosen ones tohave higher fitness for the local user.

In preferred embodiments, direct peer-to-peer communication ofindividual taste profile information occurs between neighbors. This canenable faster updating of neighbor taste profile data than would occurthrough the usual mechanism described in this specification. Furtherembodiments provide an output mechanism for showing identifiers of thedigital item currently being experienced by other neighbors; in somesuch embodiments that information is also used to update theneighbors'taste profiles stored on the local node while waiting for fullupdated taste profiles to arrive through the usual mechanisms. Inpreferred embodiments which make use of peer-to-peer techniques asdescribed in this paragraph, the fact that some nodes may be behindfirewalls that prohibit incoming connections from being made are handledby sending the necessary data through other nodes that do have thenecessary ports open. Any software developer of ordinary skill in theart of peer-to-peer network programming will immediately see how tocreate the necessary peer-to-peer mechanisms for the functionalitydescribed in this paragraph; it should not be construed that onlyparticular implementation mechanism are within the scope of theinvention.

In preferred embodiments, users can create different taste profiles forthemselves which fit different moods or interests. Most or all of theoverall mechanism described in this specification then applies to eachseparate taste profile. Neighbors are found and recommendations aregenerated for each one. For instance, playlists generated in Apple'siTunes program can comprise music taste profiles.

In some embodiments at least some users run a special version ofsoftware that implements the invention, in which not all the usual userinterface features are necessarily present. In these embodiments,certain musical tracks are indicated as being free of charge—forinstance, in the names of the files, or in a database. The user isrecommended a collection of free songs. Identifiers for the songs arethen uploaded to server system (not necessarily the same one as forother functions). Then the free songs are copied into a portable musicplayer from a computer that is networked to the that server. Then theportable music player is packed and shipped to the user. Facilities areprovided where there is a web site where the user orders and pays forthe the player, and is informed about how to get the software that willmake the recommendations. In preferred embodiments a list of recommendedfree songs is presented to the user and the user can choose which oneshe wants; identifiers of the chosen songs are sent to the server.Networking software and online store developers of orderinary skill willimmediately see numerous ways to implement the required functionality;various implementations are equivalent from the standpoint of theinvention; the scope of the invention is therefore not limited tocertain implementation techniques. Note that this functionality may beremoved from other aspects of the present invention; recommendations maybe wholly made on the server based on data that is input, via the Web,to the server, using any recommendation methodology; the recommendedsongs are then loaded onto a portable players and shipped as described.

Some embodiments which involve artists having special accounts enablechat rooms for each artist, and provided indicators in the UI associatedwith artist names (such as next to the artist names in a list ofartists) that show whether they are in the chat room or not, and meansare provided for the user to click or otherwise interact with anonscreen control to cause them to “enter” the artist's chat room andchat with the artist. Practitioners of ordinary skill in the relevantprogramming techniques will immediately see numerous ways to implementthis and these are equivalent from the standpoint of the invention.

In some embodiments special taste profiles are created that arestructured like user taste profiles, but actually are taste profiles foran item. For instance in an emodiment which calculates user similaritybased on the musical artists they have in common in their libraries,taste profiles are manually created for certain songs (such as songsthat are sponsored by commercial interests) that mimic user tasteprofiles in the sense that each one contains a list of artist. Then thesame similarity-calculating code can be used to find the songs that aremost similar to the current user, and these may appear in a specialrecommendation list or mixed in with other recommendations

GLOSSARY FOR APPENDIX 1

User Nodes—machines that are on the network that also directly interactwith users; typically these are machines owned by users or associatedwith them at their work locations.

Screen scraping—a software process that reads an HTML (or other) page onthe World Wide Web (or other network system) that is intended for humanuse, and extracts useful data from it for machine use.

Ghost user—data representing an individual that is derived from anexternal source such as a music blog. In many ways, ghost users may betreated identically to regular users of the system.

Current user, Target user, Local user—these terms represent the user whois running software which implements a client portion of the invention;typically he or she is the one that is recommendations and one or morelists of neighbors are associated with in the course of examples in thisspecification.

IM—instant message, typically asociated with chat software.

Neighbor—may be used to indicate machine-generated neighbors and/orpermanent neighbors. Note that different embodiments may use differentterminology for these.

Nearest neighbors—the set of neighbors who are most similar in taste tothe local user; normally the same as machine-generated neighbors; thoughit is not impossible that a user will manually find a neighbor that isactually more similar in taste than the machine-generated ones, and addhim to permanent neighbors.

Artist—a creator of items of interest to the subject domain or one ofthe subject domains of an embodiment of the invention. We use the termfor shorthand, and, for example, in some domains such as academicpapers, it could refer to an academic who wrote or co-wrote such apaper.

UI—user interface. In most cases, the user interface will involve acomputer with a CRT or flat-panel screen and a keyboard, displaying awindowing system such as Microsoft Windows or Mac OS X. Such systemsnormally provide standard means to create menus, lists (or tables),checkboxes, etc. In other cases the UI may be audio with input by meansof telephone touch-tones. The requirement is that it providesfunctionality that facilitates human-computer interaction.

Item—an item is the basic unit of content, such as a song

Interest profile or taste profile—data which is indicative of theinterests or tastes of a user. Often used interchangeably in thisspecification. For instance, digital music user will normally haveidentifiers of the songs he likes (or that are in his collection) in histaste profile.

Server—a server is a central computer, or networked group of centralcomputers that handle certain tasks for the benefit of the client nodes,such as storing a database containing login ID's, passwords, andprofiles.

APPENDIX 2

This appendix describes another description of key functionality of theinvention, including but not limited to facilitating retrieval ofrepresentations of nearest neighbor candidate taste profiles andassociated user identifiers in an order such that said nearest neighborcandidate taste profiles tend to be at least as similar to a tasteprofile of the target user according to a predetermined similaritymetric as are subsequently retrieved ones of said nearest neighborcandidate taste profiles.

This description is from U.S. provisional patent application 60/540,041,filed Jan. 27, 2004.

The specification describes a product named Goombah. However, the focuson Goombah is for clarity and descriptive purposes only and it must notbe construed that the scope of the invention is limited to thatparticular embodiment or to the field that Goombah operates on (music).

Goombah's first purpose is to build a list of “nearest neighbors” foreach user. They then form a community of like-minded people forcommunication purposes, and they also form a source for recommendationsof items—if you have extremely similar tastes to me and you have analbum I don't have and you play it all the time, I should probably giveit a try. So that's the basis of the recommendations.

To find nearest neighbors exactly correctly is an O(Nˆ2) problem ifsimple technology is used, and we hope to have hundreds of thousands ormillions of users whose profiles are constantly being updated, so wewanted to do better than O(Nˆ2).

There are probabilistic nearest neighbor algorithms that reduce thiscomplexity hugely, but at a loss in reliability in finding the truenearest neighbors. We wanted to do better.

The key idea behind Goombah, whose purpose is to solve the aboveproblem, is that the computations for finding the local user's nearestneighbors are carried out on that user's machine. So, if we have amillion users, we have a million CPU's doing the work of finding nearestneighbors.

There are three reasons why such an approach is within now the realm offeasibility, where it wasn't a few years ago:

1) Most people who are heavy users of digital music have high-speedInternet connections, otherwise it would be unpleasant to do downloadsfrom the likes of Apple's iTunes Music Store.

2) New technologies such as BitTorrent has emerged recently whichoffload bandwidth concerns for sending large files from a central serverto the user nodes. In particular, the following is true for BitTorrent:The central server has a copy of the file that people need, but once oneuser has it on his machine, he is automatically set up as a server aswell, and so on for every other user. Transfers are carried out fromother users invisibly. (This is different from something like napsterwhere you have to choose another user and request a download. Instead,the central server knows where all the copies of the files are, andtells a node that needs a copy the addresses of several machines tosimultaneously get different chunks of the file from until the wholefile is build. If a sending node drops out, other nodes automaticallytake its place, and so the file is eventually downloaded from multiplechanging sources in a completely automated way.) This means it ispossible for a company like Transpose to make very large files availableto very large numbers of users without having hugely expensive serverand bandwidth needs. Furthermore, it happens that BitTorrent is opensource with a very friendly license and written in the same language(Python) that Goombah is written in.

3) Any serious digital music user already has a hard drive withgigabytes of space devoted to music, so spending a 100 megs or more onthe data associated with an application like Goombah is no big deal. Inthe future, videos will commonly be stored on user hard drives, so thatis another application for Goombah as it evolves.

So, essentially the idea, when a local user wants to find his nearestneighbors, is to download the profiles of all other users who couldreasonably be considered to be candidates to be nearest neighbors ofthat local user. Then, the local user's Goombah application does asearch of all those profiles to find the best matches.

Instead of downloading individual profiles, Goombah will download asingle very large file—10's or even 100's of megs—that contain thecandidate profiles. This will happen by means of BitTorrent.

These large files will be formed by a clustering algorithm.

We will find clusters of similar users which are large enough to containmost reasonable nearest-neighbor candidates for each general type ofmusical taste. They will be large enough to fill that need, and smallenough to download in a reasonable time on a high-speed connection andnot take a problematic amount of space on the user's hard drive.

So, the local user will download a large BitTorrent file containing allnearest neighbor candidates and do an exhaustive search on his machinefor nearest neighbors.

Then he can communicate with his taste-mates and get automated musicrecommendations from them.

The large file will be updated on a regular basis with furtherBitTorrent downloads.

The clustering algorithm can be any clustering algorithm that is capableof clustering a large number of users according to their degree ofinterest in a large number of subject items. (Where the degree ofinterest may be indicated by real-valued, binary, integer or any otherthat can represent a degree of interest.)

As one example, the commonly-used C4.5 algorithm can do this. Forexample, the open-source Java software WEKA has a module,weka.classifiers.treesj48, which implements C4.5. In the context ofusing this module in a music setting, each user is an “Instance” and thesong identifiers, such as strings containing the artist name, album name(if any), and song title, are used as the values of a “nominalattribute” representing the songs.

MISCELANEOUS NOTES FOR APPENDIX 2

The step of using the local CPU to find nearest neighbors can beconducted in various ways. Any sub-algorithm which accomplishes thefunction “find nearest neighbors out of the downloaded large file” isconsidered equivalent for the purposes of the present invention.Possible ways to do it include an exhaustive search for the other usersthat are most similar to the local user according to some similaritymetric. (The attached Python scripts, recommenderclass.py andtasteprofileclass.py contain code for generating a similarity metric.However it must be stressed that there are innumerable ways ofgenerating a similarity metric for nearest-neighbor purposes, and theyare all functionally equivalent from the standpoint of the presentinvention and all fall within the scope of the present invention. We canuse any metric that results in reasonable likelihood that two users thatare considered more “similar” than another pair of users actually havemore shared interests in the targeted interest-domain [such as music]than another pair of users with lesser similarity. Note further that wearen't using the word “metric” in its most rigorous sense, but in itsgeneral sense as a quantity used for measurement and comparisons.)

Another way to find the nearest neighbors from the downloaded large fileis to use the vp-tree technique introduced by Peter N. Yianilos in hispaper “Data Structures and Algorithms for Nearest Neighbor Search inGeneral Metric Spaces”. The large file to be downloaded would beformatted as a vp-tree and thus very fast nearest-neighbor searcheswould be facilitated on the local machine. Again, any technique used tofind the nearest neighbors is functionally equivalent from thestandpoint of the invention and falls within the scope of the invention.

The step of using peer-to-peer techniques for downloading the largefiles can also occur in various ways which are functionally equivalentfrom the point of view if the current invention. In fact, the inventiondoes not depend on any particular technique for getting files from peersand all such techniques should therefore be considered functionallyequivalent from the point of view of the invention. For instance, whileBitTorrent provides a particularly compelling model for how this may beaccomplished, the Gnutella provides an alternative model.

A difference between the BitTorrent and Gnutella approaches is that withBitTorrent, each file has a distinct URL which is understandable by aserver machine which runs BitTorrent “tracker” software. By means ofthis URL, client software is told by the tracker which peers store thefile (or parts of the file) so that the client can cause downloads to bestarted from a subset (or all) of those peers. With the Gnutellaapproach, there is no central server, and the local computer sendsqueries into the “cloud” of known peers and machines known to thosepeers, looking for files with particular filenames. Then, normally, oneof those peers is chosen to be the source of the download.

The commonality between all these various techniques is that the largefiles each represent a group of similar profiles (or, alternatively, allavailable profiles), there are a fixed set of such files at any point intime, and the user causes one (or more) to be downloaded that is (are)particularly likely to contain worthy nearest neighbor candidates; thesefiles are usually downloaded from one or more peers rather than from acentral server. All techniques which satisfy these requirements arefunctionally equivalent from the perspective of the present inventionand thus fall within the scope.

One key step is determining which large file a particular client shoulddownload in order to meet the needs of its user. Of course, inembodiments where all the profiles are in one large cluster, there is noissue. When they are divided into clusters, and each cluster isrepresented by a particular large file, however, this step needs to becarried out.

One way to accomplish this step is as follows:

When a system is first set up to embody this invention, it will usuallyonly have a relatively small number of users on Day 1. Thus, there is noneed to divide the population into separate clusters for downloading. Asthe user population grows in size, a single file is used for downloadpurposes.

Finally a point may arrive at which it is deemed, due to the relative ofexpense of bandwidth and diskspace, that the user population should bedivided into two clusters. At that time, a clustering algorithm is runand the user population is divided into two clusters. Each of the twoclusters is given a name: for instance, “U0” and U1”.

Now, as time goes on, we do not regenerate those clusters from scratch.Rather, as new users are added to the system, they are added to the mostappropriate cluster. This may be done in any number of ways. A centroidfor the cluster may be calculated, and the new user added to the clusterwhose centroid it is most similar to. Or the average similarity betweenthe user and each cluster member may be calculated for each candidatecluster, and the most appropriate cluster chosen on that basis. Or, thechange in entropy that would arise in the system as a whole due to eachpossible choice of cluster can be calculated, and the choice taken thatminimizes the change in entropy. Any of these techniques, and all othertechniques that cause the user to be placed in one of the existingclusters, are functionally equivalent from the point of view of thispatent as long as they have put the user in a cluster that is highlylikely to result in a good degree of similarity between the new user andother members of the cluster.

In this way, clusters have consistent meaning over time, and the usercan stay in the same cluster, until a further split is deemed necessary.In preferred embodiments, this is handled by the expected large filesimply not existing at a particular point in time, and this is detectedby the client, which thus assumes it needs a new cluster assignment. Itthen queries the server system for a new assignment. For a pre-existinguser this is easily determined because the new assignment was madeduring the split process, so the server returns another clusteridentifier consistent with that split. For example, if a user was incluster U0, he may now be in cluster U01 (where the leading 0 representsthe lineage). (Of course any cluster naming convention can be used, butpreferred ones encode the lineage in the name).

Other embodiments which use a fast enough clustering approach regeneratethe clusters from scratch on a regular basis. In such embodiments theclient either requests a new identifier for the cluster file, or one issent automatically by the server when the client and server are incommunication. (Note that this communication can actually take a numberof forms. Rather than sending text strings, numeric or other identifierscan be sent which are in turn used by the client to build the necessaryhandle to access the file. Two examples: In a Gnutella-style system,this handle would probably be a search term. In a BitTorrent-stylesystem, the handle might be the URL for the torrent.)

Still other embodiments have relatively stable clusters but continuouslywork to refine them by moving users from one cluster to another if sucha movement provides superior clustering. For instance, periodically eachuser may be considered again as if it were a new user, and a decisionmade about what cluster it should go into. If it changes then that willbe reflected in future communications between the client and the server(although the change does not need to be reflected immediately).

In some embodiments, the client has no persistent “knowledge” about whatcluster the user is in, and when it's time to get a new cluster, queriesthe server for the information required to start a download of theappropriate one.

In some embodiments, users may be assigned to more than one cluster. Asone example of how that might be done, a number of standard clusteringapproaches such as C4.5, assign probabilies for cluster assignments;thus a user might with a higher probability reside in one cluster thananother. It would be possible to take the two clusters with the highestprobability for a given user, and say that he resides in both of them.The invention is not limited to any particular approach to putting usersin more than 1 cluster. The functionality is simply that the user wouldgo in the clusters that provide a high match to his interests, and anytechnique that accomplishes that is functionally equivalent from theperspective of the present invention and is therefore within the scope.

In some embodiments, different clustering arrangements exist fordifferent genres. For example a user who has both classical music andjazz in his collection might benefit from different nearest-neighborcommunities generating different recommendations in each area. So, theentire clustering and downloading structure and steps, in someembodiments, are carried out more than once. In other (preferred)embodiments, each user still is in only one (or a small group of)cluster(s), but his client software finds different nearest neighborsets, depending on genre, from within those clusters. Of course, innon-music applications, this concept is extended by means of theanalogous principle to “genre” that exists in that other subject area.For instance, if the items are weblogs, then an individual might beinterested in weblogs about Perl scripting and also weblots aboutRepublican politics. These different subject areas are handledanalogously to genres in the music world.

In order for the system to respond to the needs of users who arecontinually buying new music (viewing new weblogs, etc), in preferredembodiments it is possible for neighborhoods to be updated accordingaccordingly. This means that the large files representing clusters needto be either re-downloaded or updated periodically. We will discussbelow some of the ways this is accomplished in various embodiments. Thescope should not be construed to be limited to these particulartechniques. Rather, any technique that “enables the potential neighborfiles to be updated or replaced often enough to increase the accuracyand pleasure in using the system” equally fulfils the required functionand is thus considered to be in the scope.

In some embodiments, download file identifiers (which may be URL's,terms, etc.) are constructed based on two pieces of data: the clusteridentifier plus the date. For instance a user might be in cluster U011.If the date is Jan. 27, 2004, the download file identifier might beU01120040127. The client can then get an update by, for instance,downloading the file containing that string in its name or byconstructing a BitTorrent URL based on that string.

The client machine can then download the file upon whatever schedule ismost consitent with the user's needs and desires. Bandwidth will be aconstraint, so there is reason not to download the files too frequently.In preferred embodiments, there is a choice in the “preferences” sectionof the program whereby the user can specify how often he wants to updatethe file. He will probably do so less frequently if he has a dialupmodem connection than if he has a cable modem. Some embodiments useinformation available in the computer (for instance, provided by theoperating system) to determine the connection speed, and automaticallychoose a download schedule accordingly. Some ask the user to specify thedownload speed and automatically choose a download schedule accordingly.Other ways of determining a download schedule, including the user'smanually starting each download, are all functionally equivalent andwithin the scope.

Some embodiments automatically cause files of different sizes to bedownloaded according to connection speed (or at the choice of the user).One way this is done is for the server to store a tree of clusterarrangements. For instance, suppose clusters are arrived at by splittingbigger clusters in half, and the lineage of the cluster is representedin the file name. Then, for example, U0 might be the parent of U01, andU01 might be the parent of U011. Then a client with less bandwidthavailable to it might retrieve cluster U011 and one with a great amountof bandwidth but with a user with a very similar taste profile to thefirst client, might retrieve cluster U0. The difference is that thelarger the downloaded cluster, the more likely it is that the true mostsimilar neighbors, out of the whole universe of neighbors, will be foundby the client.

In some (preferred) embodiments it is possible to either download acluster as a whole, or download updates. For instance, using the namingconvention we have used above, U01120040127-20040126 might be theidentifier of the file that contains the difference data between anup-to-date representation of cluster U011 as it appeared on Jan. 26,2004 and the version that was current on Jan. 27, 2004. Then a preferredembodiment will automatically choose whatever method will result ingetting current more quickly. For instance, if no update has occurred ina number of days, it may be more efficient to download the completefile. But if the last update was recent, it may be more efficient todownload a series of daily updates.

In a preferred embodiment making use of BitTorrent, the server stores,for each cluster, files representing the current complete cluster,individual updates for the last 6 days, and the last 4 weekly updatefiles (files that update for a whole week). BitTorrent requests for anyof these files cause them to be loaded to client machines, where theyare henceforth made available in a peer-to-peer manner. Any such mannerof scheduling updates is functionally equivalent.

Those skilled in the art will know how to create such update files.There are general “patching” software technologies, but moreparticularly it is easy to create custom approaches. For instance, ifthe cluster file contains a list of user ID's with each user ID followedby a list of the songs found on his or her computer, an update file mayconsist of a list of user ID's of users who downloaded new songs in thecorresponding time interval, with each user ID followed by a list of thenew songs and a list of songs that used to be on the user's disk and nolonger are. All such representations are functionally equivalent andfall within the scope of the invention.

Another aspect is the fact that changes on the user's machine need to beuploaded to the server. In some embodiments this is done on a regularschedule when there are changes to upload. Preferred embodiments onlysend changes since the last upload rather than uploading the entireinterest profile. Preferred embodiments don't send changes untilsufficient changes have accrued that it is “worthwhile” to do an update.For instance, in embodiments where taste profiles include informationabout the number of times a song has been played, it makes a bigdifference when that count goes from 0 to 10, but very little differencewhen it goes from 1000 to 1001. A simple way to determine significanceis to have a cutoff for the percentages involved. For instance, if playcounts are used, the if overall they have changed by 1%, that might beconsidered significant. If simple presence/absence data is used, than a1% difference in that data might be considered significant.Alternatively, the entropy of the data may be used. For instance,entropy can be calculated based on the exercise of choosing a “play” atrandom, and computing the probability that such a randomly chosen playinstance would arrive at a particular song. So there is one probabilityfor each song. Based on those probabilities the song entropy may becalculated. Then significance may be determined by a particular amountof change in entropy occurring, either on a percentage basis or based ona fixed minimum change in value. Any technique that determines that adesirable amount of change has occurred is considered functionallyequivalent from the standpoint of the invention and thus falls withinthe scope.

In some embodiments the user can determine how much significance isrequired before an update occurs; in others it is automaticallydetermined based on bandwidth; in others it is determined on a globalbasis by the server; in others some combination is used such as amaximum upload frequency being determined by the server with the userhaving the ability to set the frequency or significance required as longas it is below the global value; any number of other techniques arepossible and considered functionally equivalent within the scope of theinvention.

Note: Music is discussed in this specification for reasons of exampleonly. The invention applies to other areas just as well, including textdocuments, videos, weblogs, and indeed any type of item where userinterest can be determined by means of his association, and/or degree ofassociation, with a number items of potential interest. Softwaredevelopers will readily see how to create these alternative embodiments.It must not be construed that the invention is limited to the specificexamples described in this specification.

The overall invention, in broadest form, consists of a server (ornetworked group of servers) that stores the cluster files containinginterest profiles and distributes them to client machines, and clientmachines that then distribute those files to other client machines; thenearest neighbors are then chosen on client machines and used forpurposes of recommendation and community.

Clusters should be large enough to include most users whose profiles arereasonably likely to be global “nearest neighbors” for any given localuser.

It would be worth while to discuss one further sample application of thetechnology. That is one where users are purchasers of DVD's for viewingvideos. The interest profile would consist of the list of DVD's owned bythe user (perhaps with additional entries that are liked or particularlydisliked by the user), optionally associated with the ratings. Numeroustechnologies are available for finding nearest neighbors based on suchdata, such as those used by Firefly or the movie recommendation patentsof John Hey, or the present inventor's U.S. Pat. No. 05,884,282. (Allsuch algorithms are functionally equivalent from the standpoint of thepresent invention.) This profile data is usually manually entered by theuser.

In addition to forming communities and recommendations as alreadydescribed, this embodiment adds functionality for making it visible toother users that one has DVD's one is willing to lend out, and forkeeping track of DVD's that have been lent. Additionally, preferredembodiments have functionality for rating lenders of DVD's according totheir reliability (much as is done on eBay or various action sites withrespect to sellers). Skilled practicioners of the art of Web programmingwill immediately see how to create appropriate user interfaces.

In some embodiments this lending data is stored on the server for easyaccess by various clients and in others it is made available bypeer-to-peer means.

The idea is that when the system finds people who have similar tastes,they will be able to help each other by lending DVD's to each other.Because they have similar tastes, they will be able to lend multipleDVD's. They may also email each other or chat with each other aboutDVD's of interest through addresses made available through the interfaceor through automatic means. These factors lead to a relationship oftrust, which minimizes the risk in sharing DVD's. So such a service hasthe potential to do what netflix does, but since there is no centralrepository of DVD's, at much lower cost.

Of course other physical objects of interest than DVD's are the subjectof other embodiments; CD's is one applicable subject area.

APPENDIX 3 Introduction (Appendix 3)

This appendix describes another way of implementing key functionality ofthe invention, including but not limited to facilitating retrieval ofrepresentations of nearest neighbor candidate taste profiles andassociated user identifiers in an order such that said nearest neighborcandidate taste profiles tend to be at least as similar to a tasteprofile of the target user according to a predetermined similaritymetric as are subsequently retrieved ones of said nearest neighborcandidate taste profiles.

The representations mentioned in the previous paragraph may be the userprofiles themselves (including the taste profiles), or just the tasteprofiles (which should include an identifier of the user)—or they may beuser ID's of the users, or URL's enabling the data to be located on thenetwork, or any other data that allows taste profiles and associateduser ID's to be accessed. These are all functionally equivalent from thestandpoint of the invention.

So that it may be taken separately, this Appendix describes theinvention anew.

The present invention is a new approach to dynamically creating onlinegroups of similarly-minded people for both community-building andgenerating recommendations of items of interest to the communities.

The invention is a form of distributed computing for searching which wewill refer to as “distributed profile climbing” or “DPC”. In preferredembodiments it is a kind of middle ground between a server-basedInternet service and a peer-to-peer one.

The invention consists of a networked computer system running specialsoftware. The network is typically the Internet (but can be any networkwhich interconnects computers) and the computer can be a broad range ofcomputer hardware that a user might own, a typical personal computerrunning with 256 megabytes of RAM a Pentium processor being one example.The connection to the network may be a direct connection, or may bewireless, based on radio, light, Ethernet cabling, etc.

Distributed Profile Climbing

Peer-to-peer networks are a popular way to handle such challenges assharing files between many users. The main problem is that not everyonewho wants to participate in such a network can do so fully. This is fora number of reasons—computers may not be on all the time, or they may beportable, or they may have firewall and/or network address translationissues.

Pseudo-peer-to-peer networks handle that problem by creating proxies forthe machines of each user who wants to participate. These proxies existon server systems, but typically the technical requirements for thoseservers are light because the proxies merely store and transmit datarelated to the machine they are proxying.

An example of this is Radio UserLand's “upstreaming”. Radio UserLand isa software package that runs on end-user computers and lets users createweblog entries. Those entries may then be sent (“upstreamed”) toUserLand's servers. Web users who wish to view a Radio UserLandcustomer's weblog can then look at the proxy data on UserLand's servers.Note that, in a world where everyone had computers always able to allowaccess to other users, there would be no need for this upstreaming totake place. Each weblog writer's machine could serve their weblogs tothe rest of the world. But we are not in such a world, so the practicalsolution is to send the weblog data somewhere where can be alwaysavailable to other people, in the form of a data object which is locatedat a particular URL on a reliable server. This data object is the proxyfor the user's machine.

DPC networks share a common foundation with pseudo-peer-to-peer networkslike UserLand Radio in the sense that each user's data is represented bya proxy data object located on a remote server. However, in DPCnetworks, this data contains a profile of the user in order to comparesimilarity of interests. In preferred embodiments, the proxy object fora user further contains key information for other users who have alreadybeen found to be similar in interests to that user. This key informationis sufficient to enable the proxies of those other users to be accessed(typically, this would be by means of constructing a URL that accessesthe proxies).

One very important aspect of searching for similar profiles isintelligently handling users that have already been compared at leastonce. In some cases, it may be desired to never compare them again; inothers it may be desired to compare them again after a certain amount oftime or a certain number of updates have occurred. Most approaches fortaking care of this involve storing representations of which pairs ofprofiles have already been compared.

For instance some solutions store a table with a concatenated keycontaining the logon ID's of the two users that have been compared. Butthis is a problem. If we assume that over time every user will becompared to every other (ignoring the expense of those comparisons fornow) and there are 10,000,000 users in the database, the result is atable with 100,000,000,000,000 records. That is not within the realm ofreasonable possibility for affordable server installations.

However, now assume there are 10,000,000 users each with their ownmachine, and each machine stores the logon ID's approximately theapproximately 10,000,000 users it may have been compared to over time.This is entirely within reason given the most computers being sold todayare equipped with 10's of gigabytes of storage. This is the way DPChandles the problem, in embodiments which involve such lists. Preferredsuch embodiments contain the calculated similarity metric for eachcomparison as well as the date and time of the comparison, and otherpertinent information may be included as well.

Moreover for embodiments that handle previously-checked lists, there isno need for the kind of very sophisticated, highly scalable databasesoftware that would be required to store that data on a central server.

Furthermore, in most DPC systems, the similarity metrics are computed onthe user's machines rather than on the server. This is not arequirement, but it does help to distribute the workload and simplifythe scalability issues for the server.

As a matter of practical implementation, preferred embodiments wherethere are large numbers of users divide the proxies for various usersamong separate servers residing in one or more physical hosting sites.Usually the proxies are divided up in such a way that a hash functionbased on the user's ID can be used to determine which server (orsubgroup of servers) hosts that user's proxy. The benefit of dividingthe server side up this way is one of simplicity and cost—there is noneed for a high-performance central database system. Instead the serverscan operate in relative isolation to each other, even storing all datain local RAM for speed, using communicating with other server hardwarefor control and backup purposes.

An algorithm for one embodiment of the invention is shown below. Stepsare carried out in the order shown. Deeper indentation is used in therepresentation of repeated groups of operations, or operations that aredependent on the result of an “if” test. An “else” relates to theprevious “if” at the same indentation level. A “break” causes theprocess to immediately terminate the currently innermost loop, whileallowing outer loops to continue undisturbed. The operations depictedcarried by the software operating on end-user machines, except that theserver is invoked to provide data on occasion.

First we will introduce some terms. THISUSER is the user whose machinethe algorithm is running on. Each user has an associated NEIGHBORBAGwhich is his current list of ID's of similar users. In this exampleembodiment, the NEIGHBORBAG has a fixed maximum size.PREVIOUSLYCHECKEDBAG is collection of users that have already beenchecked as potential neighbors (members of NEIGHBORBAG).

In the example which will follow, all similarities are between 0 and 1,and higher similarities are better. When similarities between THISUSERand another are considered, it is implied that one of the followinghappens: a) the user's machine requests that the server send the other'suser's taste profile, such as an encoded version of the relevant datafrom his iTunes Music Library database, and the taste profiles of thetwo users are compared on THISUSER's machine, or b) the server comparesthe two users using that same data and returns the result to THISUSER'smachine. The former has the overhead that the profiles need to be sentto the user's machine, which consumes network bandwidth. The latter addsmore work that must be done on the server side, increasing thecomplexity of the server. Different embodiments need to trade off thesefactors.

repeat as long as THISUSER is online:

-   -   ask the server for the ID of a random, already-existing user;        set N to be this returned ID    -   set PREVCLIMBER to null; set PREVSIMILARITY to 0    -   repeat:        -   if N is a member of THISUSER's PREVIOUSLYCHECKEDBAG, and was            added to it <6 months ago:            -   break        -   ask the server for N's NEIGHBORBAG; save it in CLIMBERBAG        -   set C to be the member of CLIMBERBAG that is most similar to            THISUSER        -   add all members of CLIMBERBAG that are not already there to            THISUSER's PREVIOUSLYCHECKEDBAG        -   set CSIMILARITY TO C's similarity to THISUSER        -   if CSIMILARITY>PREVSIMILARITY:            -   set PREVSIMILARITY to CSIMILARITY            -   set PREVCLIMBER to C            -   set N to C        -   else:            -   if there are any members of THISUSER's NEIGHBORBAG that                have a similarity to THISUSER that is <PREVSIMILARITY:                -   If the maximum size for NEIGHBORBAG has been                    reached:                -    remove the member of THISUSER's NEIGHBORBAG which                    has the least similarity to THISUSER                -   add PREVCLIMBER to THISUSER's NEIGHBORBAG            -   break

Note that this invention must not be construed as being limited to thealgorithm above, which is presented merely as one of the more simpleways of implementing the invention.

However, all approaches that fall within the scope of the invention havein common that profiles arrive at the client node in an order that tendsto receive the profiles most similar to the current user first.Accordingly processing is included above whereby, a profile isn'tretrieved again until a sufficient time period has passed for theprofile to have appreciably changed. In the short term, the most similarmatches will exhaust themselves and less similar matches will follow.

At the beginning the retrieved profiles are essentially random, but theprocess quickly “climbs” to strong matches. The process therefore willnot retrieve profiles in exactly the ideal order; however it thetechniques used do not generally retrieve the profiles in exactly theideal order. This method will retrieve proviles in a good enough orderthat once climbing has reached a high level of similarity and profilesare not being retrieved because they already have been, we have therequired general decreasing similarity.

The climbing is accomplished by means of calculating the similaritymetric with respect to the nearest neighbors of a user for which thesimilarity has previously been calculated, where the latter was found tobe at a level high enough that it is worth the expense of going on toretrieve the interest profiles for that user's neighbors to determinewhether one or more of them will have an even greater similarity to thetarget user.

Some peer-to-peer networks, such as the Morpheus file-sharing network,have an architecture which causes data which would traditionally bestored on a server to instead be stored on a subset of user computers.We will refer to such servers, in the context of this invention (notnecessarily in the Morpheus context) as user-associated servers. In theconduct of the illegal file trading of copyrighted files, the main“advantage” of this technique is arguably that there is no company whichcontrols the master index and which can therefore be prosecuted or sued.

However, from the point of view of the present invention, there isanother reason, and that is to completely (or almost completely)eliminate the expense associated with a central server. If there is acentral server (or server network separate from user-associatedservers), then some entity has to pay for maintaining it, providing thebandwidth, etc. Without one, that necessity disappears. Eliminating thatnecessity enables this invention to be embodied, in a sense, in “puresoftware” such as an open-source software project, instead of needing toembody it in a project run as a business in order to pay for theservers. Based on the experience of the file-sharing networks, there areenough users who do not have severe firewall or connectivity issues andwho are willing to help others by making their resources available thatthis is a feasible solution. Moreover, unlike file sharing networks,there is little real problem if a user-associated server becomestemporarily or permanently unavailable, because the searching isnormally done in the background rather than in real-time.

Note that this specification has already described how a hash of theuser's ID can be used to determine which server to access for his data.In order to extend that to using user-associated servers, more isrequired (and the already-described hash may or may not be part ofthat).

In one set of embodiments there is still a central server but ratherthan serving the taste profiles, it contains a list of identifiers whichcan be used to construct the URL's where the taste profile for each usermay be found. So the actual amount of data that needs to be stored on,and sent from, the server is far less than in the earlier description.For many implementations, the load will be light enough that a singledesktop computer with cable modem or DSL (or similar) connection to theInternet will be enough.

The Gnutella network, for example, provides a “cloud” of user-associatedservers, many or all of which store the URL's (or data that can be usedto construct the URL's) of many or all of the other user-associatedservers. When a user obtains Gnutella-compliant software (whether bydownload or by other means) it normally is distributed with a list ofuser-associated servers that are frequently available. The software thencontacts those servers, and can get lists from them of other suchservers. The local node is then updated with this information, and it isavailable to other nodes that might eventually contact this node. Thus,no single central server is required.

This specification will not describe the construction of such networksin detail; rather the technical descriptions for Gnutella and other suchnetworks, readily found online using such search tools as Google, shouldbe used. Use such existing networks as a model for constructing a“cloud” of nodes which point to each other and obviate the need for acentral server.

Preferred embodiments of the invention where the profile data is storedon user-associated servers generally use the same computers for storingthat data as are used by their associated users as their day-to-daycomputers, with the exception that they must be accessible to inboundconnections (i.e., few if any Firewall or NAT issues should apply andthey should be connected to the Internet, and turned on, a substantialamount of the time).

Each user-associated server stores the profiles and neighbor lists of anumber of other users. For preferred such embodiments, the step ofretrieving a random user ID is modified so that instead of asking acentral server, first a random user-associated server in the cloud (orsemi-random, influenced by the fact that only a subset of the cloud maybe known to the node at the time) is chosen, and then that server isasked to provide a random user ID of those whose profiles and neighborlists are stored on that computer. Then the algorithm proceeds asbefore, with the exception that instead of retrieving just the ID ofother users, enough data is retrieved to construct a URL where thatuser's information is available. Then it is accessed at that location.Further, if an access fails because the URL doesn't respond or the datathat is supposed to be there isn't, a “break” is executed and theinnermost loop explicitly spelled out in the pseudocode is exited.

Further embodiments lower the percentage of times non-response ornot-found errors occur by providing multiple URL's where the same datacan be found on different user-associated servers. Then if one fails,one or more fallback machines can be tried.

In preferred embodiments, user-associated servers take responsibilityfor serving the nearest neighbors of that particular user to the broadercommunity. This causes data for similar users to be gravitate towardbeing stored on the same machines. One advantage of this technique isthat if user-associated server A is being accessed and provides aNEIGHBORBAG for similarity testing, it is likely that when the accessingnode wants to get the taste profiles for the users in the bag, secondsor minutes later, that machine will still be available on the network.

A further improvement is that, instead of sending the taste profiles forthe accessing user for the similarities to be calculated, they can becalculated on the user-associated server in cases where it is judgedthat it would be more efficient when data transmission expenses arecalculated, to send the data there. In such a case, the querying nodewould upload its taste profile to the user-associated server so thatmultiple comparisons can be carried out there without further need fornetwork data transmission.

In further embodiments, such user-associated servers not only store theneighbors of their associated users, but also other neighbors withrelatively high similarity to other users that are stored on thatuser-associated server. For instance in some embodiments a centroid maybe calculated that represents an average of the taste profiles of theusers stored on that server. One type of taste profile containsidentifiers for every song a user has played on a particular targetplatform (such as Apple's iTunes), together with the date it was firstadded to the user's collection and the number of times he has played it.A centroid averaging a number of such user profiles might contain theidentifiers for all the songs played by any of the associated users,together with, for each song, the average of the dates it was added tothe system and the average number of plays of that song per user.

The algorithm described above to find the most similar neighbors for auser may be carried out but with respect to this centroid rather thanwith respect to the user. The ID's of the users most similar to thiscentroid are stored in a neighbor list for the centroid, and theirprofiles and neighbor lists (together, their proxies) are the ones thatthat particular user-associated server takes responsibility for servingto the community. But it should not be construed that the invention islimited in scope to the concept or “centroid” or “averaging.” Anysummary of multiple user's profile information that is comparable via asimilarity metric to an individual user's profile is equivalent for thepurposes of the invention.

For example, in some embodiments that involve user's interests withrespect to text documents, a user's interests may be captured in a listof the most unusual keywords that regularly turn up in text they read.For instance a paleontologist might read text containing the word“archaeopteryx” fairly frequently. The exact frequency isn't asimportant as the fact that the population at large very rarely readstext with that word whereas the paleontologist frequently does. So, thepaleontologist's interest profile can be realistically represented by alist of such words that meet certain predetermined thresholds for“unusualness” with respect to the general population, and “frequency”with respect to the user himself. Extending that concept to a group ofusers rather than a single user, it is clear that the interests of agroup of similarly-minded individuals can be represented by a list thatcontains all the words that are in any of the individuals' personalword-lists (or that are in some predetermined proportion of such lists).This is a completely different approach from using averaging to create acentroid, but it falls equally within the scope of the invention, as doall other approaches which serve the purpose of representing anindividual's interest where individuals are concerned, and summarizingsuch interests for a group where groups are concerned, as long as it ispossible to compare the interest profiles of individuals to each otheror individual interest profiles to summary interest profiles or summaryinterest profiles to summary interest profiles and calculate appropriatesimilarity metrics. (With respect to the word list, a simple similaritymetric is to calculate the percentage of words out of the total pool ofwords formed when the lists are combined are held in common. A moresophisticated approach is to consider every word in the combined list tobe a “trial”, with success being that the word is held in common; thesimilarity metric is then the posterior mean based on a binomialdistribution and a beta prior.) Note that this process may frequentlyresult in more than one user-associated server hosting the proxy of agiven user. That is good, because that allows for redundancy in thesystem for times when a user-associated server is not available.Moreover, there is more redundancy for users who are similar to a lot ofusers then for users who are similar to only a few others. This allowsfor providing the most reliable and efficient service to the mostpeople.

As a further example, in some embodiments the summary is simply thetaste profile of the user associated with the user-associated serverthat is directing the search. By finding nearest neighbors to that sucha user is also finding neighbors who are relatively similar in taste toother users whose profile is stored on that user-associated server, aslong as the question of whose profile shall be stored is also resolvedby virtue of having a high similarity metric with respect to the userassociated with the user-associated server.

In further embodiments, each user-associated server carries out searchesusing an algorithm almost identical to one of those described above,with the exception that the search is done with respect to similarity tothe collection of users whose proxies (whether the proxy contains thetaste profile or the user's neighbor list or both and/or contains otheritems) are already being served from that particular user-associatedserver. (This is as opposed to doing such searches with respect to eachindividual user whose proxy is stored on the server or facilitating, byserving data, such searches carried out by the individualuser-associated nodes.) This may be done, as described above, bycomparing other users to a centroid of the collection or it may be doneby other summary means (all of which fall within the scope of theinvention). The standard literature on the subject of data clusteringwill reveal a number of methods that are equivalent for the purposes ofthis specification. In preferred such embodiments, the user who isassociated with the user-associated server is always among the userswhose proxy would be added to that collection if the user wasn't alreadythere. For instance, in the method which involves a centroid produced byaveraging the profiles of the users, the algorithm would never removethe user associated with the user-associated server from the list ofusers whose profiles are averaged to produce the centroid.

NOTES FOR APPENDIX 3

A central server may be not only a single server computer, but a set ofsuch computers, the distinguishing characteristic not being the numberof computers in the central server, but rather the fact that they arenot associated with a particular user but rather made available on thenetwork to serve data to a substantial number of user-associatedcomputers.

When this specification uses the term “associated with” for therelationship between a user and a computer, the computer is the computerthat the user normally accesses to get the benefits of the system, forinstance, viewing a list of the users that are more similar to him thanany others that have been examined.

The term “target user” is used occasionally in this specification torefer to a particular user who is using the invention and for whom theinvention has found, and/or is finding, other users with similarinterests and/or tastes.

Preferred embodiments make a display of the individual users who havebeen found to be most similar to the target user available through acomputer user interface. In some embodiments this takes the form of alist; in others there are other displays such as images representing theusers in 2D or N-Dimensional space. In some embodiments the positionssuch images take with respect to each other in the visual planerepresent how similar they are to each other.

Preferred embodiments make recommendations to the target user ofspecific items based on a list of nearest neighbors, that is, a list ofneighbors who are relatively similar to the target user in taste whenwith respect to other users of the system. They do this by processingthe preferences of the nearest neighbors in ways that are similar to howthis is done in other nearest-neighbor-based collaborative filteringsystems such as, for example, in the GroupLens Usenet filtering system,http://www.si.umich.edu/˜presnick/papers/cscw94/GroupLens.htm,incorporated herein by reference, or the system described in UpendraShardanand's 1995 thesis, Social Information Filtering: Algorithms forAutomating “Word of Mouth,”http://citeseer.nj.nec.com/rd/61053528%2C323706%2C1%2C0.25%2CDownload/http://citeseer.nj.nec.com/cache/papers/cs/15862/http:zSzzSzmas.cs.umass.eduzSz%7EaseltinezSz791SzSzshardanand.social_information_(—)filtering.pdf/shardanand95social.pdf,incorporatedherein by reference. Note that those two papers, and others, describehow recommendations may be made once a list of nearest neighbors hasbeen determined, and those and other approaches exemplified by those maybe used once such a list has been determined, regardless of theparticular calculation originally done to determine the degree ofsimilarity another user has and thus how the decision was made about howto add him to the list of nearest neighbors.

However, it is important to note that while the papers mentioned abovemake recommendations based on ratings manually entered by the users, thepresent invention may be used in situations where no such ratings areavailable. Instead other information may be available, such as the factthat the user has purchased particular items, or has chosen toexperience them a certain number of times (for instance, has played amusical track a certain number of times). When only purchase data isavailable, a purchase can be considered to be equivalent to a rating of“good” and no purchase can be considered equivalent to a rating of“poor”. When the number of times a user has chosen to experience an itemis available, an easy way to approximate the effect of having ratings isto rank the items by the number of experiences. Then divide the rank bythe number of items. This results in a number between 0 and 1 that canbe used as a rating-equivalent, normalized to that interval so that the“ratings” of all users are on the same scale. So the techniquesmentioned in the afore-mentioned papers, and others, are still usableeven where there are no explicit ratings.

However, for purposes of example, a particular technique of makingrecommendations for situations where nearest neighbors have been foundand “number of experiences” data is available for each item will bepresented here.

This technique is to simply add up the number of experiences for eachitem for all nearest neighbors. For example, assume that out of auniverse of 1,000,000 music fans, the system has found 100 nearestneighbors for the target user. For each item associated with each fan,there is a count of how many times each song has been played. If thesystem simply adds up these counts for each item, the item with thehighest total count may be considered to be the most popular item inthat community, and should be recommended to the target user if hehasn't already experienced it. Equivalently, one can compute thearithmetic mean of the number of plays, where the number of plays is 0for users that haven't experienced the item at all.

A variant of the approach described in the previous paragraph that isarguably more reliable is to compute log(1+K) for each neighbor/itemcombination, where K is the number of times the user has experienced theitem in question, and then calculate the sum of these values for thepopulation of nearest neighbors. The higher that sum is, the more highlythe item should be recommended. The advantage of using the log is thatfor an item to be recommended highly, it is more important for the itemto be experienced often by a large number of nearest neighbors than itis for a few nearest neighbors to experienced the item a huge number oftimes.

The same two papers as mentioned above that discuss collaborativefiltering, and others such as the specification of my own U.S. Pat. No.5,884,282, herein incorporated by reference, describe different ways ofcreating metrics to capture degrees of similar between two users. Allsuch metrics fall within the scope of the invention. The invention isn'tlimited to particular metrics; rather the focus of the invention is onthe structure of the search and where the relevant data is stored.

A similarity metric that is used in preferred embodiments where explicituser-entered ratings are not available is the following. Assume user Ais the target user, and we want to know how similar user B is to user A.We calculate an approximation, subject to certain assumptions which areuseful to us but may not be true in the real world, of a certainprobability. This can be loosely summarized as being probability that,if a randomly chosen item X not in A's collection but in B's collectionis put into A's collection, that if we pick a random time in the futurewhen A is experiencing an item from his collection, it will be X. Animplementation of this concept that teaches the technique is included inthe tasteprofile.py module included the computer program listingappendix and described in Appendix 4.

Embodiments of this invention serve the useful purpose of determiningwhich other participating users are most similar to a user who is aparticipant in the system, and storing that information in the computerfor purposes of displaying that community and/or making recommendationsof desirable items. Further embodiments not only store that information,but display the community members and/or recommendations through thesystem's user interface.

Some embodiments store each user's profile on their associatedcomputers. Due to issues mentioned above, many user-associated computersmay not be accessible to other users from the internet. So a techniquemust be provided by which users can serve their profiles when they arestored on user machines. Gnutella-style networks provide an example forthis. Nodes which are accessible from the Internet allow incomingconnections to be made from nodes which are not necessarily connected.Then, data on those not-otherwise-accessible nodes is made available toother nodes on the network, through the network-accessible nodes whichthe not-otherwise-accessible nodes are connected to. In the case ofGnutella, this data includes lists of available files and the filesthemselves. (Seehttp://www9.limewire.com/developer/gnutella_protocol_(—)0.4.pdf, herebyincorporated by reference, for more information on the details of theGnutella approach.) In the present invention, the network-accessibleservers usually store lists of the user ID's associated with the nodesthey are connected to, and when a request arrives for data asociated oneof those ID's, the request is routed to the appropriate connected node,the data is retrieved by the network-accessible node, and then sent bythe networkdd-accessible node to the requesting node. Most embodimentsthat use the search algorithm described earlier in this specificationmodify it when it is used in the configuration described in thisparagraph so that if the data for an ID is not available a “continue” iscalled in the innermost loop so that control goes to the top of theloop, and processing continues as if that information had not beenrequested. Note that to facilitate “hits” occurring as frequently aspossible, nodes normally try to connect to network-accessible computerswho are on their nearest-neighbors list. This makes it likely thatnetwork-addressable nodes will be connected to some of their associatedusers's nearest neighbors, so that when the interest profiles ofneighbors are needed by the algorithm, they can more often be retrieved.In general, the presented algorithm is modified so that where,originally, ID's of similar users are requested, information is providedthat can be used to constract a one or more URL's where the informationcan be found. If the information is not found on a directlynetwork-accessible computer, the URL of a network-accessible one (suchas the one providing the URL!) can be given, which includes parameterssuch as the ID of the user whose information is desired, to tell thatnode which possibly-connected node to get the information from. Anindividual of ordinary skill in the art of peer-to-peer softwaredevelopment will understand how to create the necessary software inaccordance with this description. It should be stressed that thisparagraph is for example only, and that there are many equivalentvariants that involve, for instance, caching data on intermediateuser-associated nodes, transporting profiles to other nodes forcomparison, etc. This invention's scope must not be construed as beingdependent on specific techniques for making the data and computationsavailable in a peer-to-peer setting.

In some embodiments two forms of interest profiles are created andstored. One is a very small (in terms of the amount of data)representation. For example, if the main interest profile contains thesong names, and artist names for songs in the user's collection and thenumber of times he has played each one, which could have thousands ofentries, this miniature profile may contain only the user's mostfrequently-played 10 songs identified by a hash such as that generatedby Python's built-in hash( ) function. Preliminary screening, includingclimbing, happens as described elsewhere in this specification using theminiature rather than the full profile. Then as a last step, beforeadding another user to the target user's nearest neighbor list, the fallprofiles are checked to be sure the similarity metric is really highenough that the user should be a nearest neighbor (for instance, thatit's higher than the metric associated with the least similar neighbor).If it doesn't meet this final test, it doesn't go on the list.

When a miniature profile is used, any technique that serves to produce arelatively small (from the perspective of number-of-bytes), notnecessarily complete, representation of the data in the interest profilemay be used. The scope of the invention is not limited to particularminiaturizing technologies. For instance, in addition to the simpleapproach described above, applicable approaches include using all of theitem hashes without any counts, using a random selection of items andincluding the song name itself rather than a hash and optionally furtherusing standard compression algorithms such as are in the standard Pythonzlib library.

“Neighbors,” “users,” and similar terms are often used in thisspecification to represent their interest profiles, ID's etc.; themeaning is clear in the context.

APPENDIX 4: SOURCE CODE

The source code is contained on the computer program listing appendix.Notes about several specific modules follow:

MODULE: tasteprofileclass.py

The pair of classes appearing in this module, CalcData and TasteProfile,are tightly connected. Each TasteProfile object may have a number ofassociated CalcData objects. The CalcData objects represent one song inthe collection of the user whose TasteProfile it is.

Methods are provided for loading the object from various sources; aprogrammer of ordinary skill will readily infer the formats from theinput code.

It is worth noting that for convenience and to save memory, songs arefrequently identified by an MD5 hash based on combining and normalizingtheir artist, album, and song names.

The most important method is probably TasteProfile.calculateSimilarity(), which compares the current called TasteProfile object with anotherone passed to it as a parameter. Usually this is used for the local userto sequentially compare his profile to those of other users, in order tofind the best ones—the nearest neighbors.

In such usage, a nearest neighbor list is maintained of a predeterminedlength is maintained, and when a profile of greater similarity to thelocal user comes along, compared to the least similar of the currentnearest neighbors, the least similar one is removed from the list andthe new one added.

MODULE: recommenderclass.py

This module handles the task of using the list of nearest neighbors, andtheir associated profiles for recommendation purposes.

It makes recommendations, subject to an “adventurousness control.” Whenthe control is at one extreme, it looks for consensus among neighbors;as it moves toward the other extreme, it is more and more sensitive toopinions of individual users. (In the current embodiment, these opinionsare expressed passively simply by recording how many times each song isplayed.)

MODULE: genrerankhandlerclass.py

The code in this module represents one way of clustering cluster datacontaining songs where the songs (or most of the songs) have associatedgenre information. Of course, it can be used analogously for othersubject areas; for instance in the area of academic research, it couldmake use of the papers in the users' collections (rather than songs),and their associated keywords (rather than genres).

This algorithm has the advantage that it is much faster than mostgeneral clustering algorithms, due to making use of the effort thatoriginally went into creating the genre information. Furthermore,programmers of ordinary skill in the art will readily see various waysof improving the speed of the code further (at the cost of more codecomplexity).

MODULE: clusterfitterclass.py

On a server, this is a helper class for genrerankhandlerclass.py.However, it has another use as well. On the client, it serves to tellthe clients which identifier is associated with the cluster a clientshould download first. That is, it outputs a sorted list of clusterswith the ones most likely to yield high similarity to the local user.

It does that by means of summary data (the xInitData parameter onthe_init_method ) that is sent to the client from the server whichcontains data that summarizes the differences between the clusters.

In the current embodiment (from which this code is derived), thisenables clients to request the clusters that are most likely to havegood similarity matches first; this downloading is accomplished viaBitTorrent. We do not include the BitTorrent-related code here becausetechniques for accomplishing a BitTorrent download are readily apparentto a programmer of ordinary skill.

APPENDIX 5

This Appendix describes a class of embodiments wherein some of the usernodes run software that has only a one-way connection to the other nodesand server (if one exists). These embodiments include cases where theconnections to the other nodes and server (if one exists) involve morethan one medium. We will focus on a specific example where some of theuser nodes, which may be full personal computers or may be hand-helddevices such as Apple Computer's ipod, have radio circuitry incorporatedinto them which allow them to receive transmissions from terrestrial orsatellite radio broadcasters. (In the case of satellite transmitters,these may include the specific hardware associated with the Sirius or XMsatellite radio services.)

In the prior art the time of this writing, Sirius Satellite Radio hasannounced a handheld device, to be called the S50, which will work withits satellite network and save songs on its internal data storage. Itdoes not have the ability to receive satellite signals on its own.Rather it can only receive songs when attached to a docking device.Samsung has announced its neXus XM Satellite Radio/MP3 Players. Userswill be able to “tag” songs they hear on the radio for purchase throughthe XM+Napster online service. The neXus unit will not have a built-inantenna; rather it will connect to a dock which has an antenna, and willrecord songs from the satellite service for later play without the dockattached.

XM Satellite Radio sells a Delphi XM SKYFi2 units which includesinternal storage for pause and 30-minute replay, although the antenna isseparate. It has announced a Delphi XM MyFi unit which is handled andincludes an internal antenna.

What is missing from the prior art is a way to enable the user toreceive personalized recommendations or a “virtual channel” constructedautomatically for the benefit of that user to enable him to have theexperience of a radio channel specifically geared towards his or herindividual tastes.

The present invention provides a solution to this need.

In this set of embodiments, the nodes with two-way connections work asdescribed elsewhere in this specification. On the local node, referencedata is collected, nearest neighbors found, recommendations aregenerated, and the taste profile of the local user is distributed toother user nodes to be used by them in a similar way if they are deemedby the software to be similar enough in represented taste and intereststo those local users. Not all embodidiments of this variant that fallwithin the scope have the nodes with two-way connections receiving thetaste profiles in an order related to likely similarity to the localuser's tastes. Typically these nodes are connected by a network such asthe Internet which readily handles two-way communication.

The nodes with one-way connections, in preferred embodiments, receivetaste profiles via satellite radio. Satellite radio uses digital signalsthat can easily send taste profile data on one or more channels whilesending audio and/or video content such as podcasts on others, and/or itcan send a subset of those types of data on a single channel bytransmitting one type at one time and other types at other times.

In preferred such embodiments the one-way nodes, which in furtherembodiments may be one-way at some times and two-way at other times, arehand-held devices like the Apple ipod which include a CPU and memory tostore content data such as audio and video data, where such memory willinclude RAM and may include hard drives, flash memory, or other kinds ofpersistent storage storage. Hand-held devices are meant to be carriedfrom place to place by an individual, and many such devices do not haveongoing two-way communication abilities due to the difficulties andexpense of maintaining network connections from remote locations. Forsuch devices, satellite radio provices an excellent transmission mediumfor the taste profile and digital content information used by thepresent invention.

The one-way devices (which may, in some embodiments, have two wayconnections at other times), receive taste profile and contentinformation. They also have at least one way of inferring the user'stastes and interests. In various embodiments these may include buttonsto rate content he is hearing and/or viewing, or they may includemonitoring which content the user stops prematurely or skips over usinga mechanism such as a fast-forward button, and which content the userrepeats. Some embodiments monitor whether a user uses a rewind-likebutton to experience portions of content more than once; for instance ina listening to spoken word content, the user may want to hear some of itmore than once to aid his understanding. Preferred embodiments have aninput mechanism such as a button that indicates that a user likes a unitof content (such as a song) and would like to hear it again.

By using such mechanisms, input is provided to the software whereby thesoftware creates a profile indicating certain likes and/or dislikes ofthe user.

Taste profile data received via the one-way medium is then processed asdescribed elsewhere in this specification. Taste profiles that aresimilar to those of the local user are stored and used forrecommendation purposes. User profile information may also be used forcommunity purposes; for instance, in a cell phone embodiment, atelephone number or address may be provided whereby the local user cancall the other user whose taste profile matched. In some suchembodiments, additionally, a contact recipient will receive bio and/ortaste profile information from the local user and hear or view it beforedeciding whether to take the call; in further such embodiments thereceiver has criteria set in his software that automatically screen forcertain biographical characteristics or a certain degree of similaritybefore the user is alerted to the incoming call. In further suchembodiments location data such as GPS information is used, so that thelocal user is made aware of the location (which may not be current) ofthe remote user, or the software screens on location data so that thelocal user is only alerted to profiles associated with nearby locations,and/or, alternatively, the remote user's software screens attemptedcontacts based on the location of the local user.

Note that preferred such devices have both satellite radio-receiving andcell phone capabilities. Satellite radio reception may be maintainedwith typically lower consumption of bandwidth and energy resources thancell phone connections, and typically have higher data transfer rates,so it is helpful to receive a stream of data from the satellite, whilealso having the hardware required to allow the user to make a cell phonecall.

A key to this class of embodiments is the fact that the overall networkcontains both one-way and two-way nodes at a given instant in time(again, some of these nodes may change roles at other times). Thisenables taste profile and (in preferred embodiments) biographicalinformation or other data such as location to be sent on the network tobe received and used by the one-way, receive-only nodes. Because of thismix of node types, it is practical to collect the taste profile data onthe two-way nodes which is used to make recommendations on the one-waynodes.

In preferred embodiments of two-way nodes containing a broadcast (forinstance, satellite) radio receiver as well as wi-fi, ethernet, or otherconnection to a typical Internet service such as a dial-up service,cable modem, or DSL, data derived from other users is substantially orwholly received via the broadcast radio receiving circuitry, while datais uploaded via the Internet. This minimizes the use of limited Internet“bandwidth” for receiving large amounts of data.

A detailed description of a particular embodiment:

Software incorporating all or much of the software contained in thisspecification runs on a large number of desktop personal computers,connected to the Internet. We will refer to it as the Goombah software,since there is presently software of that name that incorporates much ofthat code. The users of those computers use Apple Computer's iTunessoftware to play music. iTunes writes an XML file XXX on disk containingthe identifiers for each track in the user's collection. The Goombahsoftware XXX reads this XML file and uses it as the user's taste profiledata. This data is sent to a server XXX, under the control of which thedata is communicated, not only to other personal computers, but to aterrestrial radio transmitter XXX that sends the data to the satelliteXXX or satellites being used to facilitate a satellite radio servicesuch as XM or Sirius. From the satellites it is broadcast to portableunits XXX which could be, for instance, Sirius or XM-enabled versions ofApple Computers ipod device.

On the other personal computers, the taste profiles contained in thedata play the role of candidate nearest neighbors; the nearest neighborsare selected and used to provide content recommendations, as describedelsewhere in this specification.

On the portable devices, an analogous process of neighbor selection andrecommendation occurs. However in the embodiment currently beingdescribed, the recommended music takes the form of at least one virtualchannel. That is, from the user's point of view, it behaves much like astandard satellite radio channel, but at least much of the time, thecontent is selected, scheduled, and played on the user's local portabledevice.

In this emboddiment, there is an easily accessible Save button. When theuser first starts using the device, he tunes into one of the standardsatellite channels which he thinks is likely to be a good approximationto his tastes. When he hears content (for instance, a song), that heparticularly likes, he presses Save. (In some other embodiments, thereis a button for the explicit purpose of enabling the user to indicatethat he likes a song; there may be another one to indicate dislike; orthere may be an input mechanism such as physical “radio buttons,” whichallow only one to be pressed at a time, allowing a degree of liking asong to be expressed; other variants are also applicable. In furtherembodiments there is not a dedicated physical button for this purpose,but instead controls are provide whereby the user can navigate through amenuing system to choose a “Save” option. Samsung's neXus player willhave a mechanism through which the user can “tag” a song for purchase;since available photographs do not show a dedicated button for thispurpose but rather an input machanism that appears similar to the ipod'sfor navigating a menu system. The tagging function is undoubtedlyactivited through that menu system. In the context of the neXus device,tagging a song implies the user probably likes it becuase most peoplewill tend to buy songs that they like [although some will buy songs forothers such as their children; still the statistical likelihood thattagging implies liking a song makes it appropriate for our purposes]. Soembodiments built into an improved neXus device may use the tag functionfor this purpose. Alternatively or in addition, a separate “I like this”option may be available through the same menu structure which wouldserve the purpose here attributed to the Save button. Ideally, such afuture device will have a built-in antenna akin to the Delphi XM Myfi'santenna. All such variants fall within the scope of the presentinvention.)

The song has been stored in RAM even after the earlier parts of the songwere played. So it is in RAM and available to be moved to persistentstorage such as flash memory or a hard drive when the Save button ispressed. Typically there is a pause between songs, and pressing Saveduring that pause causes the previously played song to be saved. When asong is Saved it can be played again later with greater frequency thanwould be the case if the user simply waited for the satellite channel tobroadcast it again. The portable device automatically schedules the songto be played again later, and does the same for other Saved songs. Forinstance Saved songs may be played daily for the first week, then everyother day for the next week, then every third day for the first week,etc. An unlimited number scheduling variants are possible. Theembodiment described here additionally mixes songs from the user'sfavorite satellite channels with stored songs; this is one way the userhears new songs that he can decide to Save or not.

Also, since the device described in this embodiment has ipod-likefunctionality, the Saved song may be found and played again at theuser's will by means of the ipod's standard navigation features,including being played automatically in the device's Shuffle mode.

So the Save button has easily-understood use and value for the user.However, it also serves the purpose of being an input for taste profiledata. When the user Saves a song, it goes into his taste profile.Unsaved songs may not go there, although in some variations of thisembodiment, satellite songs that the user has heard in their entirety(i.e. he didn't turn the device off, select a song to play from thedevice's internal library, switch to another satellite channel, orperform some other action that cuts the song off), it is stored in theprofile with diminished mathematical weight. And in some variants, songsthat were cutoff are stored as songs that are disliked.

If the device is permanently one-way, that is, it never has a direct orindirect (through a PC) ability to send data onto the Internet oranother network, the taste profile built by the Save button (and/orother techniques) is never made available to other users. However, forthe local user's benefit, it enables him to discern which candidateneighbors are received from the satellite are nearest neighbors, and thedevice can therefore generate recommmendations in the usual way.

As the taste profile for the user of the portable device grows becauseof the use of the Save button, the recommendations that can be generatedin the usual manner become more and more accurate.

The embodiment currently being discussed involves a unique identifierfor each song, which is an md5 hash of a concatenation of the songartist, name, album defined by the makeSongHash functioned in theaccompanying code. (Other variants use other techniques such asfingerprints of the audio data, an md5 hash of a text representation ofthe audio data, etc.) This identifier is contained in the taste profilesfor the user and is used as the representation of the song in that data(or as one such representation).

When a song is recommended, it goes into a list in the device's storage,and is checked against a broadcast schedule transmitted periodically bythe satellites and received by the device. The device then knows torecord certain songs sent on certain channels in the future, and does sowhen the timee comes, saving the song data into persistent storage andadding the song to the device's music library. In this way the devicebuilds a library of music that the user is likely to enjoy. This musicis added to the user's virtual channel, and also available to play athis will through the device's song navigation mechanisms.

With regard to the virtual channel, the result is as if there a radiochannel dedicated exclusively to that individual user's tastes, whichgets more and more finely tuned over time.

When the portable device is connected to a personal computer, forinstance via FireWire, USB, BlueTooth, Ethernet or wi-fi, songsdownloaded from the satellite may be transferred to the computer, eitherfor long-term storage in that computer or played using the computer'shardware using data only persistently stored on the device.

The embodiment currently under discussion can be used in two modes:subscription and purchase modes.

In subscription mode, the user pays a set fee per month, and can storeas much music downloaded from the satellites (and/or from other sources)as well fit in the device's storage and they may be played as frequentlyas the user desires. (In some variants there is a tiered subscriptionservice, where for a particular monthly fee, a particular number ofsongs or artists's music may be stored persistently or a particularamount of storage may be allocated; or songs from the satellites may beplayed only a particular number of times.)

In purchase mode, Saving a song causes the song to be purchased. (Insome variants the Save button is labelled “Purchase”.) When the deviceis eventually connected to a two-way network or to a wireless-enabledfinancial “smart card” with debit capabilities, or to analogousfinancial technologies, the cost is deducted.

Further variants of the embodiment described above:

Rather than receiving a schedule from the satellites the schedule may bereceived over the Internet or other network for devices that sometimeshave connections to such networks. In further embodiments no schedule isavailable and instead, a directory is provided of channels together withtaste-descriptive data such as a list of genres that each channelfocuses on or a list of representative artists, which is used todetermine which channels are likely to contain songs the user will wantto hear and/or are or will be recommended.

In typical embodiments, taste profile data contains genre info for songsin the taste profile, or the service provides a look-up table mappingsong identifiers to genres. When the channels have associated genreinformation, they can use that information for recommended songs tochoose likely channels to listen to to receive the songs. Wheninformation such as representive artists is used to describe channels,the artists that most frequently appear in taste profiles having therecommended song can be matched against the lists of artists describingdifferent channels, and the channels that best match the currentlyrecommended-but-not-yet-downloaded songs are the ones that the devicefocuses on in waiting for the song to arrive.

Some embodiments use sonic descriptors of each channel to describe it.For instance, the companies Savage Beast and SoundFlavor describe eachsong by a set of attributes including such factors as tempo,instrumentantation, sex of the singer, and hundreds of others. Some themare human-generated, and some are software-generated (the softwareexamines the audio data) or generated with the aid of software. It isobvious that with such a collection of attributes, average values orother kinds of summarizations may be generated for each channel thattends to describe the music played on that channel. And a vector orother structure may be provided that enables the attributes associatedwith recommended songs to be determined.

Such structures may be downloaded via the Internet or from thesatellites. On a special channel or interspersed with other data, thesatellites can send the attribibutes associated with each song, eitherat the same time as a song's audio data is transmitted, or separately;this occurs in preferred embodiments.

In some embodiments the attributes associated with songs the user likes,for instance as signified by pressing the Saved button, are summarizedby software within the handheld device. For example average values ofthe attributes can be calculated using arithmetic or geometricaveraging, or only the attributes most frequently associated with likedsongs may be counted, or other summarization techniques may be used;these comprise a taste profile of the user instead of, or in additionto, the taste profile built from identifiers of liked songs (where“liked” songs may also be signified by being already-owned by the user).In some embodiments there is an additional input device such as a buttonthat signifies that the user does not like a song; then the averagesand/or presence/absence counts used to generate the taste profile may beadjusted negatively by that control in association with a particularsong.

In some embodiments, each user is associated with a song attribute, andthe value of the attribute depends on whether the associated user hasthe song or not, and/or on how often the user plays the song. So eachsong has an associated list of attributes corresponding to users, eitherinstead of or in addition to other attributes such as ones derived fromthe sonic content.

In embodiments where there are too many song attributes to be downloadedwithout using too much bandwidth, and where the attributes arestatisitically correlated, factor analysis may be used to reduce thenumber of attributes into principle components. Based on a calculationsgenerated on a server or using distributed systems, the local device canuse these calculations to generate the principal components from locallyproduced data (such as the identifiers of the other users who have eachsong, as determined by their incoming profiles); these can be summarizedto produce a taste profile for the user. Thus it is possible to arriveat a manageable number of attributes for individual songs and localtaste profiles. Those of ordinary skill in the art of statistical factoranalysis will see how to do this.

In many embodiments having attributes associated with each song(comprising a song taste profile), which correspond well enough to theattributes of a summarized taste profile for the local user thatsimilarity can be measured between the two types of taste profiles,recommendations are generated by using the songs whose taste profilesmost closely match the local user's taste profile. Thus instead of theprocess of finding nearest neighbor users and deriving recommendationsfrom their likes and interests, the nearest neighbors are themselvesrecommendable items and the nearest ones are therefore recommended.

Some embodiments need no two-way nodes. The portable devices calculatewhich incoming songs are nearest-neighbors without any data from otheruser nodes. Note that while human input may be used to decide on theappropriate attribute values for each song, this input need not be doneon “user nodes” as we use the term elsewhere in this specification.Rather that data may be input through software specially designed forthe manual entry of such data by a someone whose job it is to do thatanalysis work.

To envision a more concrete example of the invention described in theprevious paragraph satellites broadcast taste profile information foreach song. These may be broadcast at the same time as the songs byinterleaving the music data with the song data or by using anotherchannel, or they may be broadcast at other times. In a system where abroadcast schedule is broadcast in advance of broadcasting the songs, itis preferable that the song taste profiles are broadcast a substantialamount of time before the songs themselves so that software mayautomatically schedule the future recording of very similar songs. Asdescribed earlier the user's local tasteprofile is refined over time dueto input from a Save button or other passive or active indications oftaste, and the portable device may never have any two-way connectivity.So using the portable device's CPU to find nearest neighbor songs basedon user taste profiles built up on the local machine and compatibletaste profile broadcast from the satellites produces a situation whereanalysis of each song, using human and/or software input, empowersportable devices to adaptively provide ever more appropriate listeningmaterial for users.

It should be noted that the above example if for example only and mustnot be construed to limit the scope of the invention. The role of“portable device” in the example may be played by any CPU-enableddevice, including a desktop PC, or a unit built into an automobile orairplane.

When the term “satellite” or “satellites” is used in this specification,it should be noted that whether there is one or more than one satellitemakes no difference from the standpoint of this invention. Although ofcourse a collection of satellites will provide a broader range ofcoverage than a single satellite. One advantage of the techniquesdescribed here is that, especially in embodiments where the song tasteprofiles are transmitted in close temporal proximity to the song data, aportable device is enabled to acquire a library of satellite-downloadedmusic that the user may continue to enjoy even if the device goes out ofrange of the satellite(s) for some time.

In another set of variations, no taste profiles are sent from thesatellites. Instead, a software analysis of each song is done in theportable device itself, determining values for attributes such as tempo.Software to do this sort of thing exists today in, for example, thePolyphonic HMI's Hit Song Science technology. Any engineer of ordinaryskill and access to such software will see how to use integrate it intothe present invention. Thus song taste profiles generated by suchsoftware play the same role as downloaded song taste profiles do inother embodiments described above. However, there is a substantialadvantage to downloading the song taste profiles: present software doesnot have the ability to examine song data for such attributes as senseof humor in the lyric. There are many such qualities that pertain torecorded music that software is not currently capable of analyzing. Soembodiments based wholly on software analysis of the music can beexpected not to produce as much user benefit as embodiments involving atleast some human analysis of the songs. For spoken-word content,speech-to-text software can determine many of the words spoken, andthose can be mapped to content vectors as is often done for documentanalysis; that can comprise the item taste profiles.

While the above specification focuses on songs for reasons of example,the same approach will also accrue to spoken-word recordings, “podcasts”involving music played at intervals with spoken-word in between, andvideo and even purely visual content.

For example, one set of embodiments is based upon an LCD, plasma, ornanotube display hanging on a wall. It displays different images, whichmay be moving or still, which it receives from a satellite. Tasteprofiles are downloaded which contain attributes pertaining to eachvisual item; in some embodiments the taste profiles contain identifiersof other users who like the visual item supplied by users with two-waynetwork connections; in other cases or in combination with such humanidentifiers, taste profiles containing attributes such as indications ofthe presence of various colors, hard or soft edges, and whether theimage is realistic or abstract, landscape or portrait, etc.; in someembodiments software analysis within the local device produces a tasteprofile; for instance such information as color is simple to extractfrom digital image data, there is existing software, used to blockpornographic sites, which can discern such characteristics as thepresence of bare human skin.

There is a Save button that protrudes slightly from a flame thatsurrounds the “picture”. When the user sees an image he particularlylikes, he presses Save, and then that image is stored into persistentstorage by a CPU which is embedded into the device and displayed later.The CPU also makes use of that information to improve a local tasteprofile representing the user's tastes. This enables the device toacquire more visual items that the user will enjoy, as described abovefor music.

Another set of variations of the invention here as it relates to one-waydevices but also as it relates to purely two-way node embodimentsdescribed elsewhere in this specification is similarity matching bymeans of pattern-matching technologies. For example, a song would berepresented, instead of (or in addition to) a taste profile containing alist of attributes, by a pattern-matching software. For instance, itcould be represented by a neural net, with the number of layers andnodes and the numberical values that are intrinsic to the net beingdefined in a way that takes a local user's taste profile information andoutputs a high value if the song is likely to match the user's tastesand a low value if it is not. As one way of finding the necessaryvalues, the neural net can be trained using taste profile data of userswho had two-way connections enabling the profiles to be communicated toa central server. The neural net is trained so that it takes the tasteprofiles as input, and outputs high or low values depending on whetherthe user that is currently trained on liked the song or not (forinstance based on whether he pressed a Save button or did nothing or“fast forwarded” past a song and never listened to it in its enirety; insuch a case the net would preferably be trained to output a numericalvalue with a high value in the first case, a middle value in the second,and a low value in the third). In order that there are input values totrain the neural nets, taste profiles based on song attributes are alsoprovided, and a user taste profile to be input into the artificialintelligence unit is generated based on the ones associated with thesongs the user likes (and/or does not like).

In some embodiments content items such as songs are accompanied by listsof identifiers of other songs that are considered to be likely to beenjoyed by the same people as the current song, as determined, forexample, either by having similar sonic and/or lyrical attributes, ortending to be liked (or purchased by), the same people. Theseidentifiers may be used as attributes for nearest-neighbor matching, butthey may also be used as simple indicators that the listed songidentifiers may be used to schedule the acquision of those other songsif the user likes the current one (as indicated, for example, bypressing a Save button while listening to it).

In some embodiments incorporating the virtual channel concept describedabove, when the user first starts the player, and selects a virtualchannel, if there aren't many songs (or are no songs) stored in thedevice yet, it may start playing the currently-being-broadcast song fromone of the user's favorite channels, and follow that up with a song fromstorage if one is available, or play a song from one of the user'sfavorite channels.

When playing songs from the user's favorite channels, it may receivebroadcasts from more than one channel at a time, and play one song whilesimultaneously caching another song from another channel into RAM ofpersistent storage; after the first song is complete it may play a songfrom another channel.

APPENDIX 6

This Appendix describes a class of embodiments wherein there is two waycommunication between nodes, but it is limited to a particulargeographical area, being enabled by such wireless technologies as Wi-Fior Bluetooth or the like.

When two wireless-enabled portable devices are in close enough proximitythat communications may be automatically established, a link is set upbetween the two devices. (Communications may occur between more than twodevices simultaneously, but for simplicity of example we are focusingthe interactions between one pair at a time.) For instance, a link maybe established between two devices in different automobiles or betweentwo handheld devices such as cellular phones.

All or a substantial portion of the music library identifiers in eachdevice comprises the taste profile of that devices. It is communicatedto the other device by wireless means. The similarity of the other userto the local user is calculated by means of the taste profiles (by localuser we mean the person whose information is in onee of the twodevices). If the other user's taste profile makes it one of the N mostsimilar ones seen by the local user's device, where N is a predeterminednumber, the taste profile is stored and used for recommendation purposesas described elsewhere in this specification.

Note that very similar, but older, taste profiles may be deleted, andthus there may be more N chosen for storage over the course of time.

In preferred embodiments, for a subscription fee, the devices areallowed to copy music from one device to another. If a track residing ona device to which the local user is currently connected has a highlyrecommended song on it, it is transferred to the local device eitherautomatically or after suggesting the transfer and waiting for the userto OK it (for instance, by pressing a button on the device in respone toan onscreen notification). In other embodiments, the device keeps trackof how many times the user has played a song, and to play it more than(for instance) three times, the user must buy the track. Thistransaction occurs at the time it is connected to the wired Internet(either through a wireless base station, a direct Ethernet connection,or a connection via USB, FireWire, or the like to a desktop PC which isconnected to the Internet).

In further preferred embodiments, the data corresponding to each songmay contain an indicator (such as a bit or particular byte value)indicating that certain songs are free—in other words they can belegally transferred between devices without legal or copyrighthindrance. In that case transfers occur as described above but, in theabsence of a paid subscription, only the free songs may be transferred.

Practicioners of the art of creating wireless networking hardware andsoftware, such as Bluetooth and Wi-Fi, will readily see how to handlethe connectivity aspects described in this Appendix.

INDUSTRIAL APPLICABILITY

The present invention is desirably implemented at least in part via apublic network or internet, although some embodiments make use ofsatellite transmissions and/or wireless transmissions directly fromdevice to device. It may, for example, be coupled to a private networkor intranet through a firewall server or router. As used herein, theterm “internet” generally refers to any collection of distinct networksworking together to appear as a single network to a user. The term“Internet”, on the other hand, refers to a specific implementation ofinternet, the so-called world wide “network of networks” that areconnected to each other using the Internet protocol (IP) and othersimilar protocols. The Internet provides file transfer, remote log in,electronic mail, news and other services. The system and techniquesdescribed herein can be used on any internet including the so-calledInternet.

One of the unique aspects of the Internet system is that messages anddata are transmitted through the use of data packets referred to as“datagrams.” In a datagram-based network, messages are sent from asource to a destination in a manner similar to a government mail system.For example, a source computer may send a datagram packet to adestination computer regardless of whether or not the destinationcomputer is currently powered on and coupled to the network. TheInternet protocol (IP) is completely sessionless, such that IP data grampackets are not associated with one another.

The firewall server or router is a computer or item of equipment whichcouples the computers of a private network to the Internet. It may thusact as a gatekeeper for messages and datagrams going to and from theInternet 1.

An Internet service provider (ISP) is also coupled to the Internet. Aservice provider is an entity that provides connections to a part of theInternet, for a plurality of users. Also coupled to the Internet are aplurality of web sites or nodes. When a user wishes to conduct atransaction at one of the nodes, the user accesses the node through theInternet.

For Internet-enabled embodiments, each node is configured to understandwhich firewall and node to send data packets to given a destination IPaddress. This may be implemented by providing the firewalls and nodeswith a map of all valid IP addresses disposed on its particular privatenetwork or another location on the Internet. The map may be in the formof prefix matches up to and including the full IP address.

Also coupled to the Internet is a server, containing an informationdatabase with representations of user profiles and associated useridentifiers 5. The information may be stored, for example, as a recordor as a file. The information associated with each particular user isstored in a particular data structure in a database. One exemplarydatabase structure is as follows. The database may be stored, forexample, as an object-oriented database management system (ODBMS), arelational database management system (e.g. DB2, SQL, etc.), ahierarchical database, a network database, a distributed database (i.e.a collection of multiple, logically interrelated databases distributedover a computer network) or any other type of database package. Thus,the database and the system can be implemented using object-orientedtechnology or via text files.

A computer system on which the system of the present invention may beimplemented may be, for example, a personal computer running MicrosoftWindows, Linux, Apple Macintosh or an equivalent operating system. Sucha computer system typically includes a central processing unit (CPU),e.g., a conventional microprocessor, a random access memory (RAM) fortemporary storage of information, and a read only memory (ROM) forpermanent storage of information. Each of the aforementioned componentsis coupled to a bus. The operating system controls allocation of systemresources and performs tasks such as processing, scheduling, memorymanagement, networking, and I/O services. Also coupled to the bus istypically a non-volatile mass storage device which may be provided as afixed disk drive which is coupled to the bus by a disk controller.

Data and software may be provided to and extracted from computer systemvia removable storage media such as hard disk, diskette, and CD ROM. Forexample, data values generated using techniques described herein may bestored on storage media. The data values may then be retrieved from themedia by the CPU and utilized to recommend one of a plurality of itemsin response to a user's query.

Alternatively, computer software useful for performing computationsrelated to enabling recommendatons and community bymassively-distributed nearest-neighbor searching may be stored onstorage media. Such computer software may be retrieved from the mediafor immediate execution by the CPU or by processors included in one ormore peripherals. The CPU may retrieve the computer software andsubsequently store the software in RAM or ROM for later execution.

User input to the computer system may be provided by a number ofdevices. For example, a keyboard and a mouse are typically coupled tothe bus by a controller. The computer system typically also includes acommunications adapter which allows the system to be interconnected to alocal area network (LAN) or a wide area network (WAN). Connections maybe wireless or wired, Thus, data and computer program software can betransferred to and from the computer system via the adapter, bus andnetwork; although it should be noted that in embodiments without two-wayconnectivity, the device manufactur may load the software onto thedevice.

1. A networked computer system for supplying recommendations andtaste-based community to a target user, comprising: networked means forproviding representations of nearest neighbor candidate taste profilesand associated user identifiers in an order such that said nearestneighbor candidate taste profiles tend to be at least as similar to ataste profile of the target user according to a predetermined similaritymetric as are subsequently retrieved ones of said nearest neighborcandidate taste profiles, means to receive said representations ofnearest neighbor candidate taste profiles and associated useridentifiers on at least one neighbor-finding user node, saidneighbor-finding user nodes each having at least one similarity metriccalculator calculating said predetermined similarity metric based uponsaid representations of nearest neighbor candidate taste, at least oneselector residing on at least one of said neighbor-finding user nodesusing the output of said at least one similarity metric calculator forbuilding a list representing the nearest-neighbor users, said listrepresenting said nearest-neighbor users providing access to associatedones of said candidate profiles, a nearest-neighbor based recommenderwhich uses said associated ones of said candidate profiles to recommenditems, a display for viewing identifiers of recommended items, a displayfor viewing identifiers of a plurality of nearest neighbor users, meansto select at least one of said nearest neighbor users from said displayof identifiers of a plurality of nearest neighbor users, a display ofinformation relating to at least one of the items in said nearestneighbor user's collection, whereby massively distributed processing isharnessed in a bandwidth-conserving way for finding the best neighborsout of the entire population of users, and the same neighborhood isleveraged to provide recommendations as well as highly focusedtaste-based community for sharing the enjoyment of items includingrecommended items
 2. The networked computer system of claim 1, furtherincluding means to facilitate communication with at least said nearestneighbor users where the type of communication comprises at least oneselected from the group consisting of online chat, email, onlinediscussion boards, voice, and video.
 3. A networked computer system forsupplying recommendations and taste-based community to a target user,comprising an ordered plurality of nearest neighbor candidate tasteprofiles and associated user identifiers such that said nearest neighborcandidate taste profiles tend to be at least as similar to a tasteprofile of the target user according to a predetermined similaritymetric as are subsequently positioned ones of said nearest neighborcandidate taste profiles, networked means to receive said nearestneighbor candidate taste profiles and associated user identifiers on atleast one neighbor-finding user node, said neighbor-finding user nodeseach having at least one similarity metric calculator calculating saidpredetermined similarity metric, at least one selector residing on atleast one of said neighbor-finding user nodes using the output of saidat least one similarity metric calculator for building a listrepresenting the nearest-neighbor users, said list representing saidnearest-neighbor users providing access to associated ones of saidcandidate profiles, a nearest-neighbor based recommender which uses saidassociated ones of said a nearest-neighbor based recommender which usessaid associated ones of said candidate profiles to recommend items, adisplay for viewing identifiers of recommended items, a display forviewing identifiers of a plurality of nearest neighbor users, means toselect at least one of said nearest neighbor users from said display ofidentifiers of a plurality of nearest neighbor users, a display ofinformation relating to at least one of the items in said nearestneighbor user's collection, whereby massively distributed processing isharnessed in a bandwidth-conserving way for finding the best neighborsout of the entire population of users, and the same neighborhood isleveraged to provide recommendations as well as highly focusedtaste-based community for sharing the enjoyment of items includingrecommended items
 4. The networked computer system claim 1, furtherincluding a single downloadable file that contains software thatexecutes all necessary non-server computer instructions.